Extracting Information Content from Web Pages Using Block Clustering Method

Hlaing, Nwe Nwe; Nyunt, Thi Thi Soe

MERAL Myanmar Education Research and Learning Portal

Index Tree

RootNode
- Co-operative College, Mandalay
- Cooperative College, Phaunggyi
- Co-operative University, Sagaing
- Co-operative University, Thanlyin
- Dagon University
- Kyaukse University
- Laquarware Technological college
- Mandalay Technological University
- Mandalay University of Distance Education
- Mandalay University of Foreign Languages
- Maubin University
- Mawlamyine University
- Meiktila University
- Mohnyin University
- Myanmar Institute of Information Technology
- Myanmar Maritime University
- National Management Degree College
- Naypyitaw State Academy
- Pathein University
- Sagaing University
- Sagaing University of Education
- Taunggyi University
- Technological University, Hmawbi
- Technological University (Kyaukse)
- Technological University Mandalay
- University of Computer Studies, Mandalay
- University of Computer Studies Maubin
- University of Computer Studies, Meikhtila
- University of Computer Studies Pathein
- University of Computer Studies, Taungoo
- University of Computer Studies, Yangon
- University of Dental Medicine Mandalay
- University of Dental Medicine, Yangon
- University of Information Technology
- University of Mandalay
- University of Medicine 1
- University of Medicine 2
- University of Medicine Mandalay
- University of Myitkyina
- University of Public Health, Yangon
- University of Veterinary Science
- University of Yangon
- West Yangon University
- Yadanabon University
- Yangon Technological University
- Yangon University of Distance Education
- Yangon University of Economics
- Yangon University of Education
- Yangon University of Foreign Languages
- Yezin Agricultural University
- New Index

Item

{"_buckets": {"deposit": "84125f4a-79f2-4f55-9cd0-4b2dabd1cd50"}, "_deposit": {"id": "4522", "owners": [], "pid": {"revision_id": 0, "type": "recid", "value": "4522"}, "status": "published"}, "_oai": {"id": "oai:meral.edu.mm:recid/4522", "sets": ["user-ucsy"]}, "communities": ["ucsy"], "item_1583103067471": {"attribute_name": "Title", "attribute_value_mlt": [{"subitem_1551255647225": "Extracting Information Content from Web Pages Using Block Clustering Method", "subitem_1551255648112": "en_US"}]}, "item_1583103085720": {"attribute_name": "Description", "attribute_value_mlt": [{"interim": "The World Wide Web is the main “allkind of information” repository and has been sofar very successful in disseminating to humans.As web sites are getting more complicated, theconstruction of web information extractionsystems becomes more difficult and timeconsuming. Therefore we need to mine the maincontent of web page in order to extractinformation from such web pages. In this paper,we study the problem of automaticallyextracting the web information (unsupervisedIE) without any learning examples or othersimilar human input. Firstly, web pages aresegment into several raw chunks. Then removethe noisy blocks based on product features.Data region identification is based on theobservation that appearance similarity of thedata record in web document. Therefore blockclustering method is proposed based on thisobservation. This approach requires no humanintervention and experimental results haveshown its accuracy to be promising."}]}, "item_1583103108160": {"attribute_name": "Keywords", "attribute_value_mlt": [{"interim": "Information Extraction (IE)"}, {"interim": "Wrapper"}, {"interim": "Document Object Model (DOM)"}]}, "item_1583103120197": {"attribute_name": "Files", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_access", "date": [{"dateType": "Available", "dateValue": "2019-11-15"}], "displaytype": "preview", "download_preview_message": "", "file_order": 0, "filename": "10103.pdf", "filesize": [{"value": "463 Kb"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_free", "mimetype": "application/pdf", "size": 463000.0, "url": {"url": "https://meral.edu.mm/record/4522/files/10103.pdf"}, "version_id": "ddbffd17-b905-4ab5-a2e7-4367de367f31"}]}, "item_1583103131163": {"attribute_name": "Journal articles", "attribute_value_mlt": [{"subitem_issue": "", "subitem_journal_title": "Tenth International Conference On Computer Applications (ICCA 2012)", "subitem_pages": "", "subitem_volume": ""}]}, "item_1583103147082": {"attribute_name": "Conference papers", "attribute_value_mlt": [{"subitem_acronym": "", "subitem_c_date": "", "subitem_conference_title": "", "subitem_part": "", "subitem_place": "", "subitem_session": "", "subitem_website": ""}]}, "item_1583103211336": {"attribute_name": "Books/reports/chapters", "attribute_value_mlt": [{"subitem_book_title": "", "subitem_isbn": "", "subitem_pages": "", "subitem_place": "", "subitem_publisher": ""}]}, "item_1583103233624": {"attribute_name": "Thesis/dissertations", "attribute_value_mlt": [{"subitem_awarding_university": "", "subitem_supervisor(s)": [{"subitem_supervisor": ""}]}]}, "item_1583105942107": {"attribute_name": "Authors", "attribute_value_mlt": [{"subitem_authors": [{"subitem_authors_fullname": "Hlaing, Nwe Nwe"}, {"subitem_authors_fullname": "Nyunt, Thi Thi Soe"}]}]}, "item_1583108359239": {"attribute_name": "Upload type", "attribute_value_mlt": [{"interim": "Publication"}]}, "item_1583108428133": {"attribute_name": "Publication type", "attribute_value_mlt": [{"interim": "Article"}]}, "item_1583159729339": {"attribute_name": "Publication date", "attribute_value": "2012-02-28"}, "item_1583159847033": {"attribute_name": "Identifier", "attribute_value": "http://onlineresource.ucsy.edu.mm/handle/123456789/2447"}, "item_title": "Extracting Information Content from Web Pages Using Block Clustering Method", "item_type_id": "21", "owner": "1", "path": ["1597824273898"], "permalink_uri": "http://hdl.handle.net/20.500.12678/0000004522", "pubdate": {"attribute_name": "Deposited date", "attribute_value": "2019-11-15"}, "publish_date": "2019-11-15", "publish_status": "0", "recid": "4522", "relation": {}, "relation_version_is_last": true, "title": ["Extracting Information Content from Web Pages Using Block Clustering Method"], "weko_shared_id": -1}

Extracting Information Content from Web Pages Using Block Clustering Method

http://hdl.handle.net/20.500.12678/0000004522

Preview

Name / File	License	Actions
10103.pdf (463 Kb)

Publication type
		Article
Upload type
		Publication
Title
	Title	Extracting Information Content from Web Pages Using Block Clustering Method
	Language	en_US
Publication date		2012-02-28
Authors
		Hlaing, Nwe Nwe
		Nyunt, Thi Thi Soe
Description
		The World Wide Web is the main “allkind of information” repository and has been sofar very successful in disseminating to humans.As web sites are getting more complicated, theconstruction of web information extractionsystems becomes more difficult and timeconsuming. Therefore we need to mine the maincontent of web page in order to extractinformation from such web pages. In this paper,we study the problem of automaticallyextracting the web information (unsupervisedIE) without any learning examples or othersimilar human input. Firstly, web pages aresegment into several raw chunks. Then removethe noisy blocks based on product features.Data region identification is based on theobservation that appearance similarity of thedata record in web document. Therefore blockclustering method is proposed based on thisobservation. This approach requires no humanintervention and experimental results haveshown its accuracy to be promising.
Keywords
		Information Extraction (IE), Wrapper, Document Object Model (DOM)
Identifier		http://onlineresource.ucsy.edu.mm/handle/123456789/2447
Journal articles
		Tenth International Conference On Computer Applications (ICCA 2012)
Conference papers
Books/reports/chapters
Thesis/dissertations