Log in
Language:

MERAL Myanmar Education Research and Learning Portal

  • Top
  • Universities
  • Ranking
To
lat lon distance
To

Field does not validate



Index Link

Index Tree

Please input email address.

WEKO

One fine body…

WEKO

One fine body…

Item

{"_buckets": {"deposit": "b8c5c26f-083c-429e-a9c2-04359c5aa4be"}, "_deposit": {"id": "4944", "owners": [], "pid": {"revision_id": 0, "type": "recid", "value": "4944"}, "status": "published"}, "_oai": {"id": "oai:meral.edu.mm:recid/4944", "sets": ["1597824273898", "user-ucsy"]}, "communities": ["ucsy"], "item_1583103067471": {"attribute_name": "Title", "attribute_value_mlt": [{"subitem_1551255647225": "Extracting Informative Content from Web Pages Using Content Extraction Algorithm", "subitem_1551255648112": "en"}]}, "item_1583103085720": {"attribute_name": "Description", "attribute_value_mlt": [{"interim": "Apart from the main content blocks, almost all web pages on the Internet contain such blocks as navigation, copyright information, privacy notices, and advertisements, which are not related to the topic of the web page. These blocks are called noisy blocks, and the main content blocks are called informative blocks. The information contained in the noisy blocks can seriously harm Web mining and searching. So discriminating informative blocks from the noisy blocks and then extracting the information contained in the informative blocks is an important task. In this paper, the problem of automatically extracting the web information (unsupervised IE) without any learning examples or other similar human input is studied. Firstly, web pages are segmented into several raw chunks. Then removed the noisy blocks based on product features. Content extraction is based on the relation among punctuation mark density, length of information text and anchor text density. This approach requires no human intervention, no prior knowledge of the input HTML page and no training set are required."}]}, "item_1583103108160": {"attribute_name": "Keywords", "attribute_value_mlt": [{"interim": "Web Mining"}, {"interim": "Information Extraction (IE)"}, {"interim": "Unsupervised IE"}, {"interim": "Informative Blocks"}]}, "item_1583103120197": {"attribute_name": "Files", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_access", "date": [{"dateType": "Available", "dateValue": "2019-07-12"}], "displaytype": "preview", "download_preview_message": "", "file_order": 0, "filename": "11089.pdf", "filesize": [{"value": "274 Kb"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_free", "mimetype": "application/pdf", "size": 274000.0, "url": {"url": "https://meral.edu.mm/record/4944/files/11089.pdf"}, "version_id": "19617447-fb1a-4dc5-bfbd-4eea95b984ec"}]}, "item_1583103131163": {"attribute_name": "Journal articles", "attribute_value_mlt": [{"subitem_issue": "", "subitem_journal_title": "Eleventh International Conference On Computer Applications (ICCA 2013)", "subitem_pages": "", "subitem_volume": ""}]}, "item_1583103147082": {"attribute_name": "Conference papers", "attribute_value_mlt": [{"subitem_acronym": "", "subitem_c_date": "", "subitem_conference_title": "", "subitem_part": "", "subitem_place": "", "subitem_session": "", "subitem_website": ""}]}, "item_1583103211336": {"attribute_name": "Books/reports/chapters", "attribute_value_mlt": [{"subitem_book_title": "", "subitem_isbn": "", "subitem_pages": "", "subitem_place": "", "subitem_publisher": ""}]}, "item_1583103233624": {"attribute_name": "Thesis/dissertations", "attribute_value_mlt": [{"subitem_awarding_university": "", "subitem_supervisor(s)": [{"subitem_supervisor": ""}]}]}, "item_1583105942107": {"attribute_name": "Authors", "attribute_value_mlt": [{"subitem_authors": [{"subitem_authors_fullname": "Hlaing, Yu Wai"}]}]}, "item_1583108359239": {"attribute_name": "Upload type", "attribute_value_mlt": [{"interim": "Publication"}]}, "item_1583108428133": {"attribute_name": "Publication type", "attribute_value_mlt": [{"interim": "Article"}]}, "item_1583159729339": {"attribute_name": "Publication date", "attribute_value": "2013-02-26"}, "item_1583159847033": {"attribute_name": "Identifier", "attribute_value": "http://onlineresource.ucsy.edu.mm/handle/123456789/844"}, "item_title": "Extracting Informative Content from Web Pages Using Content Extraction Algorithm", "item_type_id": "21", "owner": "1", "path": ["1597824273898"], "permalink_uri": "http://hdl.handle.net/20.500.12678/0000004944", "pubdate": {"attribute_name": "Deposited date", "attribute_value": "2019-07-12"}, "publish_date": "2019-07-12", "publish_status": "0", "recid": "4944", "relation": {}, "relation_version_is_last": true, "title": ["Extracting Informative Content from Web Pages Using Content Extraction Algorithm"], "weko_shared_id": -1}
  1. University of Computer Studies, Yangon
  2. Conferences

Extracting Informative Content from Web Pages Using Content Extraction Algorithm

http://hdl.handle.net/20.500.12678/0000004944
http://hdl.handle.net/20.500.12678/0000004944
1c358856-0fcf-458f-ba05-e82252c62de7
b8c5c26f-083c-429e-a9c2-04359c5aa4be
None
Preview
Name / File License Actions
11089.pdf 11089.pdf (274 Kb)
Publication type
Article
Upload type
Publication
Title
Title Extracting Informative Content from Web Pages Using Content Extraction Algorithm
Language en
Publication date 2013-02-26
Authors
Hlaing, Yu Wai
Description
Apart from the main content blocks, almost all web pages on the Internet contain such blocks as navigation, copyright information, privacy notices, and advertisements, which are not related to the topic of the web page. These blocks are called noisy blocks, and the main content blocks are called informative blocks. The information contained in the noisy blocks can seriously harm Web mining and searching. So discriminating informative blocks from the noisy blocks and then extracting the information contained in the informative blocks is an important task. In this paper, the problem of automatically extracting the web information (unsupervised IE) without any learning examples or other similar human input is studied. Firstly, web pages are segmented into several raw chunks. Then removed the noisy blocks based on product features. Content extraction is based on the relation among punctuation mark density, length of information text and anchor text density. This approach requires no human intervention, no prior knowledge of the input HTML page and no training set are required.
Keywords
Web Mining, Information Extraction (IE), Unsupervised IE, Informative Blocks
Identifier http://onlineresource.ucsy.edu.mm/handle/123456789/844
Journal articles
Eleventh International Conference On Computer Applications (ICCA 2013)
Conference papers
Books/reports/chapters
Thesis/dissertations
Back
0
0
views
downloads
See details
Views Downloads

Versions

Ver.1 2020-09-01 15:33:10.409602
Show All versions

Share

Mendeley Twitter Facebook Print Addthis

Export

OAI-PMH
  • OAI-PMH DublinCore
Other Formats
  • JSON

Confirm


Back to MERAL


Back to MERAL