Log in
Language:

MERAL Myanmar Education Research and Learning Portal

  • Top
  • Universities
  • Ranking
To
lat lon distance
To

Field does not validate



Index Link

Index Tree

Please input email address.

WEKO

One fine body…

WEKO

One fine body…

Item

{"_buckets": {"deposit": "f0456c7c-5207-4757-8213-2fd9c135a43c"}, "_deposit": {"id": "3523", "owners": [], "pid": {"revision_id": 0, "type": "recid", "value": "3523"}, "status": "published"}, "_oai": {"id": "oai:meral.edu.mm:recid/3523", "sets": ["user-ucsy"]}, "communities": ["ucsy"], "item_1583103067471": {"attribute_name": "Title", "attribute_value_mlt": [{"subitem_1551255647225": "Discovering Informative Content Blocks for Efficient Web Data Extraction", "subitem_1551255648112": "en"}]}, "item_1583103085720": {"attribute_name": "Description", "attribute_value_mlt": [{"interim": "As web sites are getting more complicated,the construction of web information extractionsystems becomes more troublesome and timeconsuming.A common theme is the difficulty inlocating the segments of a page in which the targetinformation is contained, which we call theinformative blocks. So discriminating informativeblocks from the noisy blocks and then extracting theinformative blocks from web page is an importanttask. In this paper, we propose a method that utilizesboth the visual features and semantic information toextract information block. First, the VIPS (VisionbasedPage Segmentation) algorithm is used topartition a web page into semantic blocks with ahierarchy structure. Then spatial features (such asposition, size) and content feature (the number ofimage and links) are extracted to construct featurevector for each block. Secondly based on thesefeature, the blocks with similar content structuresand spatial structures are clustered by means ofsimilarity computation. After clustering blocks withsimilar structures, determine the cluster with thelargest size and nearest distance to the centre ofpage as informative block."}]}, "item_1583103108160": {"attribute_name": "Keywords", "attribute_value_mlt": [{"interim": "Vision-based Page Segmentation"}, {"interim": "Information Extraction"}, {"interim": "Block Clustering"}]}, "item_1583103120197": {"attribute_name": "Files", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_access", "date": [{"dateType": "Available", "dateValue": "2019-07-25"}], "displaytype": "preview", "download_preview_message": "", "file_order": 0, "filename": "psc2010paper (62).pdf", "filesize": [{"value": "148 Kb"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_free", "mimetype": "application/pdf", "size": 148000.0, "url": {"url": "https://meral.edu.mm/record/3523/files/psc2010paper (62).pdf"}, "version_id": "ab5f65e2-b37e-49e8-a26d-8e2d9994e55d"}]}, "item_1583103131163": {"attribute_name": "Journal articles", "attribute_value_mlt": [{"subitem_issue": "", "subitem_journal_title": "Fifth Local Conference on Parallel and Soft Computing", "subitem_pages": "", "subitem_volume": ""}]}, "item_1583103147082": {"attribute_name": "Conference papers", "attribute_value_mlt": [{"subitem_acronym": "", "subitem_c_date": "", "subitem_conference_title": "", "subitem_part": "", "subitem_place": "", "subitem_session": "", "subitem_website": ""}]}, "item_1583103211336": {"attribute_name": "Books/reports/chapters", "attribute_value_mlt": [{"subitem_book_title": "", "subitem_isbn": "", "subitem_pages": "", "subitem_place": "", "subitem_publisher": ""}]}, "item_1583103233624": {"attribute_name": "Thesis/dissertations", "attribute_value_mlt": [{"subitem_awarding_university": "", "subitem_supervisor(s)": [{"subitem_supervisor": ""}]}]}, "item_1583105942107": {"attribute_name": "Authors", "attribute_value_mlt": [{"subitem_authors": [{"subitem_authors_fullname": "Hlaing, Nwe Nwe"}, {"subitem_authors_fullname": "Nyunt, Thi Thi Soe"}]}]}, "item_1583108359239": {"attribute_name": "Upload type", "attribute_value_mlt": [{"interim": "Publication"}]}, "item_1583108428133": {"attribute_name": "Publication type", "attribute_value_mlt": [{"interim": "Article"}]}, "item_1583159729339": {"attribute_name": "Publication date", "attribute_value": "2010-12-16"}, "item_1583159847033": {"attribute_name": "Identifier", "attribute_value": "http://onlineresource.ucsy.edu.mm/handle/123456789/1265"}, "item_title": "Discovering Informative Content Blocks for Efficient Web Data Extraction", "item_type_id": "21", "owner": "1", "path": ["1597824273898"], "permalink_uri": "http://hdl.handle.net/20.500.12678/0000003523", "pubdate": {"attribute_name": "Deposited date", "attribute_value": "2019-07-25"}, "publish_date": "2019-07-25", "publish_status": "0", "recid": "3523", "relation": {}, "relation_version_is_last": true, "title": ["Discovering Informative Content Blocks for Efficient Web Data Extraction"], "weko_shared_id": -1}
  1. University of Computer Studies, Yangon
  2. Conferences

Discovering Informative Content Blocks for Efficient Web Data Extraction

http://hdl.handle.net/20.500.12678/0000003523
http://hdl.handle.net/20.500.12678/0000003523
a1656d0b-2d6e-406d-998f-4176167fd9ea
f0456c7c-5207-4757-8213-2fd9c135a43c
None
Preview
Name / File License Actions
psc2010paper psc2010paper (62).pdf (148 Kb)
Publication type
Article
Upload type
Publication
Title
Title Discovering Informative Content Blocks for Efficient Web Data Extraction
Language en
Publication date 2010-12-16
Authors
Hlaing, Nwe Nwe
Nyunt, Thi Thi Soe
Description
As web sites are getting more complicated,the construction of web information extractionsystems becomes more troublesome and timeconsuming.A common theme is the difficulty inlocating the segments of a page in which the targetinformation is contained, which we call theinformative blocks. So discriminating informativeblocks from the noisy blocks and then extracting theinformative blocks from web page is an importanttask. In this paper, we propose a method that utilizesboth the visual features and semantic information toextract information block. First, the VIPS (VisionbasedPage Segmentation) algorithm is used topartition a web page into semantic blocks with ahierarchy structure. Then spatial features (such asposition, size) and content feature (the number ofimage and links) are extracted to construct featurevector for each block. Secondly based on thesefeature, the blocks with similar content structuresand spatial structures are clustered by means ofsimilarity computation. After clustering blocks withsimilar structures, determine the cluster with thelargest size and nearest distance to the centre ofpage as informative block.
Keywords
Vision-based Page Segmentation, Information Extraction, Block Clustering
Identifier http://onlineresource.ucsy.edu.mm/handle/123456789/1265
Journal articles
Fifth Local Conference on Parallel and Soft Computing
Conference papers
Books/reports/chapters
Thesis/dissertations
Back
0
0
views
downloads
See details
Views Downloads

Versions

Ver.1 2020-09-01 13:06:18.598462
Show All versions

Share

Mendeley Twitter Facebook Print Addthis

Export

OAI-PMH
  • OAI-PMH DublinCore
Other Formats
  • JSON

Confirm


Back to MERAL


Back to MERAL