MERAL Myanmar Education Research and Learning Portal
-
RootNode
-
Co-operative College, Mandalay
-
Cooperative College, Phaunggyi
-
Co-operative University, Sagaing
-
Co-operative University, Thanlyin
-
Dagon University
-
Kyaukse University
-
Laquarware Technological college
-
Mandalay Technological University
-
Mandalay University of Distance Education
-
Mandalay University of Foreign Languages
-
Maubin University
-
Mawlamyine University
-
Meiktila University
-
Mohnyin University
-
Myanmar Institute of Information Technology
-
Myanmar Maritime University
-
National Management Degree College
-
Naypyitaw State Academy
-
Pathein University
-
Sagaing University
-
Sagaing University of Education
-
Taunggyi University
-
Technological University, Hmawbi
-
Technological University (Kyaukse)
-
Technological University Mandalay
-
University of Computer Studies, Mandalay
-
University of Computer Studies Maubin
-
University of Computer Studies, Meikhtila
-
University of Computer Studies Pathein
-
University of Computer Studies, Taungoo
-
University of Computer Studies, Yangon
-
University of Dental Medicine Mandalay
-
University of Dental Medicine, Yangon
-
University of Information Technology
-
University of Mandalay
-
University of Medicine 1
-
University of Medicine 2
-
University of Medicine Mandalay
-
University of Myitkyina
-
University of Public Health, Yangon
-
University of Veterinary Science
-
University of Yangon
-
West Yangon University
-
Yadanabon University
-
Yangon Technological University
-
Yangon University of Distance Education
-
Yangon University of Economics
-
Yangon University of Education
-
Yangon University of Foreign Languages
-
Yezin Agricultural University
-
New Index
-
Item
{"_buckets": {"deposit": "9b31d65a-80b1-474f-9240-e07338ce2c15"}, "_deposit": {"id": "4069", "owners": [], "pid": {"revision_id": 0, "type": "recid", "value": "4069"}, "status": "published"}, "_oai": {"id": "oai:meral.edu.mm:recid/4069", "sets": ["user-ucsy"]}, "communities": ["ucsy"], "item_1583103067471": {"attribute_name": "Title", "attribute_value_mlt": [{"subitem_1551255647225": "Constructing and Implementing a New DOM-based Content Extraction Algorithm", "subitem_1551255648112": "en"}]}, "item_1583103085720": {"attribute_name": "Description", "attribute_value_mlt": [{"interim": "The Internet explosion has made enormous Information sources published as HTML pages on the internet. However, there are many redundant pages as being known web pages noise on the Web. For instance, almost all dot com present a large amount of noise such as service channels, navigation panels, copyright and privacy announcement, advertisements, etc. Such noises can seriously harm Web Mining, Information retrieval and Information extraction. In this paper, a new algorithm is proposed and how it can be used to deal with Web page noises is also presented. The proposed algorithm matches DOM trees to classify which nodes are noises and which are contents and, after classification, cluster into their group respectively. Finally, only the content group is extracted from the page. The resulting contents are useful for both users and systems. The proposed technique leads to boost up the performance of Web Content Extraction."}]}, "item_1583103108160": {"attribute_name": "Keywords", "attribute_value": []}, "item_1583103120197": {"attribute_name": "Files", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_access", "date": [{"dateType": "Available", "dateValue": "2019-08-05"}], "displaytype": "preview", "download_preview_message": "", "file_order": 0, "filename": "55172.pdf", "filesize": [{"value": "800 Kb"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_free", "mimetype": "application/pdf", "size": 800000.0, "url": {"url": "https://meral.edu.mm/record/4069/files/55172.pdf"}, "version_id": "72328387-d9f8-4686-8ba4-b3756c746ead"}]}, "item_1583103131163": {"attribute_name": "Journal articles", "attribute_value_mlt": [{"subitem_issue": "", "subitem_journal_title": "Fourth Local Conference on Parallel and Soft Computing", "subitem_pages": "", "subitem_volume": ""}]}, "item_1583103147082": {"attribute_name": "Conference papers", "attribute_value_mlt": [{"subitem_acronym": "", "subitem_c_date": "", "subitem_conference_title": "", "subitem_part": "", "subitem_place": "", "subitem_session": "", "subitem_website": ""}]}, "item_1583103211336": {"attribute_name": "Books/reports/chapters", "attribute_value_mlt": [{"subitem_book_title": "", "subitem_isbn": "", "subitem_pages": "", "subitem_place": "", "subitem_publisher": ""}]}, "item_1583103233624": {"attribute_name": "Thesis/dissertations", "attribute_value_mlt": [{"subitem_awarding_university": "", "subitem_supervisor(s)": [{"subitem_supervisor": ""}]}]}, "item_1583105942107": {"attribute_name": "Authors", "attribute_value_mlt": [{"subitem_authors": [{"subitem_authors_fullname": "Moong, Nang Kham Line"}]}]}, "item_1583108359239": {"attribute_name": "Upload type", "attribute_value_mlt": [{"interim": "Publication"}]}, "item_1583108428133": {"attribute_name": "Publication type", "attribute_value_mlt": [{"interim": "Article"}]}, "item_1583159729339": {"attribute_name": "Publication date", "attribute_value": "2009-12-30"}, "item_1583159847033": {"attribute_name": "Identifier", "attribute_value": "http://onlineresource.ucsy.edu.mm/handle/123456789/1775"}, "item_title": "Constructing and Implementing a New DOM-based Content Extraction Algorithm", "item_type_id": "21", "owner": "1", "path": ["1597824273898"], "permalink_uri": "http://hdl.handle.net/20.500.12678/0000004069", "pubdate": {"attribute_name": "Deposited date", "attribute_value": "2019-08-05"}, "publish_date": "2019-08-05", "publish_status": "0", "recid": "4069", "relation": {}, "relation_version_is_last": true, "title": ["Constructing and Implementing a New DOM-based Content Extraction Algorithm"], "weko_shared_id": -1}
Constructing and Implementing a New DOM-based Content Extraction Algorithm
http://hdl.handle.net/20.500.12678/0000004069
http://hdl.handle.net/20.500.12678/00000040695e57c828-d962-40ed-9223-1dfd329749f6
9b31d65a-80b1-474f-9240-e07338ce2c15
Name / File | License | Actions |
---|---|---|
![]() |
|