MERAL Myanmar Education Research and Learning Portal
-
RootNode
-
Co-operative College, Mandalay
-
Cooperative College, Phaunggyi
-
Co-operative University, Sagaing
-
Co-operative University, Thanlyin
-
Dagon University
-
Kyaukse University
-
Laquarware Technological college
-
Mandalay Technological University
-
Mandalay University of Distance Education
-
Mandalay University of Foreign Languages
-
Maubin University
-
Mawlamyine University
-
Meiktila University
-
Mohnyin University
-
Myanmar Institute of Information Technology
-
Myanmar Maritime University
-
National Management Degree College
-
Naypyitaw State Academy
-
Pathein University
-
Sagaing University
-
Sagaing University of Education
-
Taunggyi University
-
Technological University, Hmawbi
-
Technological University (Kyaukse)
-
Technological University Mandalay
-
University of Computer Studies, Mandalay
-
University of Computer Studies Maubin
-
University of Computer Studies, Meikhtila
-
University of Computer Studies Pathein
-
University of Computer Studies, Taungoo
-
University of Computer Studies, Yangon
-
University of Dental Medicine Mandalay
-
University of Dental Medicine, Yangon
-
University of Information Technology
-
University of Mandalay
-
University of Medicine 1
-
University of Medicine 2
-
University of Medicine Mandalay
-
University of Myitkyina
-
University of Public Health, Yangon
-
University of Veterinary Science
-
University of Yangon
-
West Yangon University
-
Yadanabon University
-
Yangon Technological University
-
Yangon University of Distance Education
-
Yangon University of Economics
-
Yangon University of Education
-
Yangon University of Foreign Languages
-
Yezin Agricultural University
-
New Index
-
Item
{"_buckets": {"deposit": "b8c5c26f-083c-429e-a9c2-04359c5aa4be"}, "_deposit": {"id": "4944", "owners": [], "pid": {"revision_id": 0, "type": "recid", "value": "4944"}, "status": "published"}, "_oai": {"id": "oai:meral.edu.mm:recid/4944", "sets": ["1597824273898", "user-ucsy"]}, "communities": ["ucsy"], "item_1583103067471": {"attribute_name": "Title", "attribute_value_mlt": [{"subitem_1551255647225": "Extracting Informative Content from Web Pages Using Content Extraction Algorithm", "subitem_1551255648112": "en"}]}, "item_1583103085720": {"attribute_name": "Description", "attribute_value_mlt": [{"interim": "Apart from the main content blocks, almost all web pages on the Internet contain such blocks as navigation, copyright information, privacy notices, and advertisements, which are not related to the topic of the web page. These blocks are called noisy blocks, and the main content blocks are called informative blocks. The information contained in the noisy blocks can seriously harm Web mining and searching. So discriminating informative blocks from the noisy blocks and then extracting the information contained in the informative blocks is an important task. In this paper, the problem of automatically extracting the web information (unsupervised IE) without any learning examples or other similar human input is studied. Firstly, web pages are segmented into several raw chunks. Then removed the noisy blocks based on product features. Content extraction is based on the relation among punctuation mark density, length of information text and anchor text density. This approach requires no human intervention, no prior knowledge of the input HTML page and no training set are required."}]}, "item_1583103108160": {"attribute_name": "Keywords", "attribute_value_mlt": [{"interim": "Web Mining"}, {"interim": "Information Extraction (IE)"}, {"interim": "Unsupervised IE"}, {"interim": "Informative Blocks"}]}, "item_1583103120197": {"attribute_name": "Files", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_access", "date": [{"dateType": "Available", "dateValue": "2019-07-12"}], "displaytype": "preview", "download_preview_message": "", "file_order": 0, "filename": "11089.pdf", "filesize": [{"value": "274 Kb"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_free", "mimetype": "application/pdf", "size": 274000.0, "url": {"url": "https://meral.edu.mm/record/4944/files/11089.pdf"}, "version_id": "19617447-fb1a-4dc5-bfbd-4eea95b984ec"}]}, "item_1583103131163": {"attribute_name": "Journal articles", "attribute_value_mlt": [{"subitem_issue": "", "subitem_journal_title": "Eleventh International Conference On Computer Applications (ICCA 2013)", "subitem_pages": "", "subitem_volume": ""}]}, "item_1583103147082": {"attribute_name": "Conference papers", "attribute_value_mlt": [{"subitem_acronym": "", "subitem_c_date": "", "subitem_conference_title": "", "subitem_part": "", "subitem_place": "", "subitem_session": "", "subitem_website": ""}]}, "item_1583103211336": {"attribute_name": "Books/reports/chapters", "attribute_value_mlt": [{"subitem_book_title": "", "subitem_isbn": "", "subitem_pages": "", "subitem_place": "", "subitem_publisher": ""}]}, "item_1583103233624": {"attribute_name": "Thesis/dissertations", "attribute_value_mlt": [{"subitem_awarding_university": "", "subitem_supervisor(s)": [{"subitem_supervisor": ""}]}]}, "item_1583105942107": {"attribute_name": "Authors", "attribute_value_mlt": [{"subitem_authors": [{"subitem_authors_fullname": "Hlaing, Yu Wai"}]}]}, "item_1583108359239": {"attribute_name": "Upload type", "attribute_value_mlt": [{"interim": "Publication"}]}, "item_1583108428133": {"attribute_name": "Publication type", "attribute_value_mlt": [{"interim": "Article"}]}, "item_1583159729339": {"attribute_name": "Publication date", "attribute_value": "2013-02-26"}, "item_1583159847033": {"attribute_name": "Identifier", "attribute_value": "http://onlineresource.ucsy.edu.mm/handle/123456789/844"}, "item_title": "Extracting Informative Content from Web Pages Using Content Extraction Algorithm", "item_type_id": "21", "owner": "1", "path": ["1597824273898"], "permalink_uri": "http://hdl.handle.net/20.500.12678/0000004944", "pubdate": {"attribute_name": "Deposited date", "attribute_value": "2019-07-12"}, "publish_date": "2019-07-12", "publish_status": "0", "recid": "4944", "relation": {}, "relation_version_is_last": true, "title": ["Extracting Informative Content from Web Pages Using Content Extraction Algorithm"], "weko_shared_id": -1}
Extracting Informative Content from Web Pages Using Content Extraction Algorithm
http://hdl.handle.net/20.500.12678/0000004944
http://hdl.handle.net/20.500.12678/00000049441c358856-0fcf-458f-ba05-e82252c62de7
b8c5c26f-083c-429e-a9c2-04359c5aa4be
Name / File | License | Actions |
---|---|---|
![]() |
|
Publication type | ||||||
---|---|---|---|---|---|---|
Article | ||||||
Upload type | ||||||
Publication | ||||||
Title | ||||||
Title | Extracting Informative Content from Web Pages Using Content Extraction Algorithm | |||||
Language | en | |||||
Publication date | 2013-02-26 | |||||
Authors | ||||||
Hlaing, Yu Wai | ||||||
Description | ||||||
Apart from the main content blocks, almost all web pages on the Internet contain such blocks as navigation, copyright information, privacy notices, and advertisements, which are not related to the topic of the web page. These blocks are called noisy blocks, and the main content blocks are called informative blocks. The information contained in the noisy blocks can seriously harm Web mining and searching. So discriminating informative blocks from the noisy blocks and then extracting the information contained in the informative blocks is an important task. In this paper, the problem of automatically extracting the web information (unsupervised IE) without any learning examples or other similar human input is studied. Firstly, web pages are segmented into several raw chunks. Then removed the noisy blocks based on product features. Content extraction is based on the relation among punctuation mark density, length of information text and anchor text density. This approach requires no human intervention, no prior knowledge of the input HTML page and no training set are required. | ||||||
Keywords | ||||||
Web Mining, Information Extraction (IE), Unsupervised IE, Informative Blocks | ||||||
Identifier | http://onlineresource.ucsy.edu.mm/handle/123456789/844 | |||||
Journal articles | ||||||
Eleventh International Conference On Computer Applications (ICCA 2013) | ||||||
Conference papers | ||||||
Books/reports/chapters | ||||||
Thesis/dissertations |