Index Link

  • RootNode

Item

{"_buckets": {"deposit": "86d894bb-ca43-4d0f-8deb-c60cf0b79dc9"}, "_deposit": {"id": "4917", "owners": [], "pid": {"revision_id": 0, "type": "recid", "value": "4917"}, "status": "published"}, "_oai": {"id": "oai:meral.edu.mm:recid/4917", "sets": ["user-ucsy"]}, "communities": ["ucsy"], "item_1583103067471": {"attribute_name": "Title", "attribute_value_mlt": [{"subitem_1551255647225": "Web Page Categorization Based on Content and Data Extraction for Academic Community", "subitem_1551255648112": "en"}]}, "item_1583103085720": {"attribute_name": "Description", "attribute_value_mlt": [{"interim": "The web is a large amount of data and difficult tosearch information or data of user interest (ITacademic field). Therefore, it needs to categorize formeet user’s interesting field easily. Web pagecategorization help improve the quality of web search.In this paper, we proposed a framework for web dataextraction by using categorized web pages to improvedata extraction accuracy and result. Firstly, thenumbers of test web pages are defined as inputs. Weuse page segmentation algorithm (VIPS) to performsegmentation these pages to achieve content structurefor web page cleaning and to evaluate informative ormain content block. These main contents arecategorized by using Support Vector Machine (SVM)which gives accurate and efficient result. Thesecategorized web pages are stored into the database(IT library) to output data accurately when user query."}]}, "item_1583103108160": {"attribute_name": "Keywords", "attribute_value_mlt": [{"interim": "VIPS"}, {"interim": "SVM"}, {"interim": "Web Page Segmentation"}, {"interim": "Categorization"}, {"interim": "Data extraction"}]}, "item_1583103120197": {"attribute_name": "Files", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_access", "date": [{"dateType": "Available", "dateValue": "2019-07-02"}], "displaytype": "preview", "download_preview_message": "", "file_order": 0, "filename": "12017.pdf", "filesize": [{"value": "893 Kb"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_free", "mimetype": "application/pdf", "size": 893000.0, "url": {"url": "https://meral.edu.mm/record/4917/files/12017.pdf"}, "version_id": "40ec590e-0f48-4475-8744-5fa0ee023d77"}]}, "item_1583103131163": {"attribute_name": "Journal articles", "attribute_value_mlt": [{"subitem_issue": "", "subitem_journal_title": "Twelfth International Conference On Computer Applications (ICCA 2014)", "subitem_pages": "", "subitem_volume": ""}]}, "item_1583103147082": {"attribute_name": "Conference papers", "attribute_value_mlt": [{"subitem_acronym": "", "subitem_c_date": "", "subitem_conference_title": "", "subitem_part": "", "subitem_place": "", "subitem_session": "", "subitem_website": ""}]}, "item_1583103211336": {"attribute_name": "Books/reports/chapters", "attribute_value_mlt": [{"subitem_book_title": "", "subitem_isbn": "", "subitem_pages": "", "subitem_place": "", "subitem_publisher": ""}]}, "item_1583103233624": {"attribute_name": "Thesis/dissertations", "attribute_value_mlt": [{"subitem_awarding_university": "", "subitem_supervisor(s)": [{"subitem_supervisor": ""}]}]}, "item_1583105942107": {"attribute_name": "Authors", "attribute_value_mlt": [{"subitem_authors": [{"subitem_authors_fullname": "Phyu, Sabai"}, {"subitem_authors_fullname": "Linn, Khaing Wah Wah"}]}]}, "item_1583108359239": {"attribute_name": "Upload type", "attribute_value_mlt": [{"interim": "Publication"}]}, "item_1583108428133": {"attribute_name": "Publication type", "attribute_value_mlt": [{"interim": "Article"}]}, "item_1583159729339": {"attribute_name": "Publication date", "attribute_value": "2014-02-17"}, "item_1583159847033": {"attribute_name": "Identifier", "attribute_value": "http://onlineresource.ucsy.edu.mm/handle/123456789/82"}, "item_title": "Web Page Categorization Based on Content and Data Extraction for Academic Community", "item_type_id": "21", "owner": "1", "path": ["1597824273898"], "permalink_uri": "http://hdl.handle.net/20.500.12678/0000004917", "pubdate": {"attribute_name": "Deposited date", "attribute_value": "2019-07-02"}, "publish_date": "2019-07-02", "publish_status": "0", "recid": "4917", "relation": {}, "relation_version_is_last": true, "title": ["Web Page Categorization Based on Content and Data Extraction for Academic Community"], "weko_shared_id": -1}

Web Page Categorization Based on Content and Data Extraction for Academic Community

http://hdl.handle.net/20.500.12678/0000004917
eb442643-6e0b-4933-9bf7-fc9f09c2d2b0
86d894bb-ca43-4d0f-8deb-c60cf0b79dc9
None
Name / File License Actions
12017.pdf 12017.pdf (893 Kb)
Publication type
Article
Upload type
Publication
Title
Title Web Page Categorization Based on Content and Data Extraction for Academic Community
Language en
Publication date 2014-02-17
Authors
Phyu, Sabai
Linn, Khaing Wah Wah
Description
The web is a large amount of data and difficult tosearch information or data of user interest (ITacademic field). Therefore, it needs to categorize formeet user’s interesting field easily. Web pagecategorization help improve the quality of web search.In this paper, we proposed a framework for web dataextraction by using categorized web pages to improvedata extraction accuracy and result. Firstly, thenumbers of test web pages are defined as inputs. Weuse page segmentation algorithm (VIPS) to performsegmentation these pages to achieve content structurefor web page cleaning and to evaluate informative ormain content block. These main contents arecategorized by using Support Vector Machine (SVM)which gives accurate and efficient result. Thesecategorized web pages are stored into the database(IT library) to output data accurately when user query.
Keywords
VIPS, SVM, Web Page Segmentation, Categorization, Data extraction
Identifier http://onlineresource.ucsy.edu.mm/handle/123456789/82
Journal articles
Twelfth International Conference On Computer Applications (ICCA 2014)
Conference papers
Books/reports/chapters
Thesis/dissertations
0
0
views
downloads
Views Downloads

Export

OAI-PMH
  • OAI-PMH DublinCore
Other Formats