Log in
Language:

MERAL Myanmar Education Research and Learning Portal

  • MERAL
  • Universities
  • Ranking
AND
To


Index Link

Index Tree

WEKO

One fine body…

Item

{"_buckets": {"deposit": "58bc2e71-b171-4ab8-b808-f29b7f6d9d2a"}, "_deposit": {"id": "5003", "owners": [], "pid": {"revision_id": 0, "type": "recid", "value": "5003"}, "status": "published"}, "_oai": {"id": "oai:meral.edu.mm:recid/5003"}, "item_1583103067471": {"attribute_name": "Title", "attribute_value_mlt": [{"subitem_1551255647225": "Noise Block Cleaning and Main Content Block Extraction from Dynamic Web Page", "subitem_1551255648112": "en"}]}, "item_1583103085720": {"attribute_name": "Description", "attribute_value_mlt": [{"interim": "Web Information Extraction systemsbecomes more complex and time-consuming. Webpage contains many informative blocks and noiseblocks. Noise blocks are navigational elements,templates and advertisements that are not the maincontent blocks of the web page; it can be definednoisy blocks or boilerplate text. This boilerplate texttypically is not related to the main content, maydeteriorate search precision and thus needs to bedetected properly. This paper proposes a Web Pagecleaning and main content block extraction approachand purposes of improving the accuracy andefficiency of web content mining. The system usesstructural features and the shallow text features assuch as number of words, link density, and averageword length can be used to classify the main contentor boilerplate text from the web page. And then thesystem extracts main content block using threeparameters such as Title keyword, KeywordFrequency based Block selection and positionfeatures. The relevant content blocks are identified asthe high important level by similarity of blockcontents to other blocks. Experiments show that WebPage cleaning based on shallow features lead to moreaccurate and efficient classification results forboilerplate or other content than existing approaches."}]}, "item_1583103108160": {"attribute_name": "Keywords", "attribute_value_mlt": [{"interim": "Boilerplate Detection"}, {"interim": "Decision Tree"}, {"interim": "Shallow Text features"}, {"interim": "Web Content Mining"}]}, "item_1583103120197": {"attribute_name": "Files", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_access", "date": [{"dateType": "Available", "dateValue": "2019-07-02"}], "displaytype": "preview", "download_preview_message": "", "filename": "12020.pdf", "filesize": [{"value": "115 Kb"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_free", "mimetype": "application/pdf", "size": 115000, "url": {"url": "https://meral.edu.mm/record/5003/files/12020.pdf"}, "version_id": "d0871e20-542c-4253-9717-a08f14947b0f"}]}, "item_1583103131163": {"attribute_name": "Journal articles", "attribute_value_mlt": [{"subitem_issue": "", "subitem_journal_title": "Twelfth International Conference On Computer Applications (ICCA 2014)", "subitem_pages": "", "subitem_volume": ""}]}, "item_1583103147082": {"attribute_name": "Conference papers", "attribute_value_mlt": [{"subitem_acronym": "", "subitem_c_date": "", "subitem_conference_title": "", "subitem_part": "", "subitem_place": "", "subitem_session": "", "subitem_website": ""}]}, "item_1583103211336": {"attribute_name": "Books/reports/chapters", "attribute_value_mlt": [{"subitem_book_title": "", "subitem_isbn": "", "subitem_pages": "", "subitem_place": "", "subitem_publisher": ""}]}, "item_1583103233624": {"attribute_name": "Thesis/dissertations", "attribute_value_mlt": [{"subitem_awarding_university": "", "subitem_supervisor(s)": [{"subitem_supervisor": ""}]}]}, "item_1583105942107": {"attribute_name": "Authors", "attribute_value_mlt": [{"subitem_authors": [{"subitem_authors_fullname": "San, Pan Ei"}, {"subitem_authors_fullname": "Aye, Nilar"}]}]}, "item_1583108359239": {"attribute_name": "Upload type", "attribute_name_i18n": "Upload type", "attribute_value": "Publication"}, "item_1583108428133": {"attribute_name": "Publication type", "attribute_name_i18n": "Publication type", "attribute_value": "Article"}, "item_1583159729339": {"attribute_name": "Publication date", "attribute_name_i18n": "Publication date", "attribute_value": "2014-02-17"}, "item_1583159847033": {"attribute_name": "Identifier", "attribute_value": "http://onlineresource.ucsy.edu.mm/handle/123456789/90"}, "item_title": "Noise Block Cleaning and Main Content Block Extraction from Dynamic Web Page", "item_type_id": "21", "owner": "1", "path": ["1582963302567/1597824273898"], "permalink_uri": "http://hdl.handle.net/20.500.12678/0000005003", "pubdate": {"attribute_name": "Deposited date", "attribute_value": "2019-07-02"}, "publish_date": "2019-07-02", "publish_status": "0", "recid": "5003", "relation": {}, "relation_version_is_last": true, "title": ["Noise Block Cleaning and Main Content Block Extraction from Dynamic Web Page"], "weko_shared_id": -1}
  1. University of Computer Studies, Yangon
  2. Conferences

Noise Block Cleaning and Main Content Block Extraction from Dynamic Web Page

http://hdl.handle.net/20.500.12678/0000005003
dddfcbc1-58a9-4159-ade0-34e2127c8379
58bc2e71-b171-4ab8-b808-f29b7f6d9d2a
Preview
Name / File License Actions
12020.pdf 12020.pdf (115 Kb)
Publication type Article
Upload type Publication
Title
Noise Block Cleaning and Main Content Block Extraction from Dynamic Web Page
en
Publication date 2014-02-17
Authors
San, Pan Ei
Aye, Nilar
Description
Web Information Extraction systemsbecomes more complex and time-consuming. Webpage contains many informative blocks and noiseblocks. Noise blocks are navigational elements,templates and advertisements that are not the maincontent blocks of the web page; it can be definednoisy blocks or boilerplate text. This boilerplate texttypically is not related to the main content, maydeteriorate search precision and thus needs to bedetected properly. This paper proposes a Web Pagecleaning and main content block extraction approachand purposes of improving the accuracy andefficiency of web content mining. The system usesstructural features and the shallow text features assuch as number of words, link density, and averageword length can be used to classify the main contentor boilerplate text from the web page. And then thesystem extracts main content block using threeparameters such as Title keyword, KeywordFrequency based Block selection and positionfeatures. The relevant content blocks are identified asthe high important level by similarity of blockcontents to other blocks. Experiments show that WebPage cleaning based on shallow features lead to moreaccurate and efficient classification results forboilerplate or other content than existing approaches.
Keywords
Boilerplate Detection
Keywords
Decision Tree
Keywords
Shallow Text features
Keywords
Web Content Mining
Journal articles
Twelfth International Conference On Computer Applications (ICCA 2014)
Conference papers
Books/reports/chapters
Thesis/dissertations
Back
0
0
views
downloads
See details
Views Downloads

Versions

Ver.1 2020-09-01 15:37:33.101557
Show All versions

Share

Mendeley CiteULike Twitter Facebook Print Addthis

Export

OAI-PMH
  • OAI-PMH DublinCore
Other Formats
  • JSON

Confirm


Back to MERAL


Back to MERAL