MERAL Myanmar Education Research and Learning Portal
Item
{"_buckets": {"deposit": "86d894bb-ca43-4d0f-8deb-c60cf0b79dc9"}, "_deposit": {"id": "4917", "owners": [], "pid": {"revision_id": 0, "type": "recid", "value": "4917"}, "status": "published"}, "_oai": {"id": "oai:meral.edu.mm:recid/4917", "sets": ["user-ucsy"]}, "communities": ["ucsy"], "item_1583103067471": {"attribute_name": "Title", "attribute_value_mlt": [{"subitem_1551255647225": "Web Page Categorization Based on Content and Data Extraction for Academic Community", "subitem_1551255648112": "en"}]}, "item_1583103085720": {"attribute_name": "Description", "attribute_value_mlt": [{"interim": "The web is a large amount of data and difficult tosearch information or data of user interest (ITacademic field). Therefore, it needs to categorize formeet user’s interesting field easily. Web pagecategorization help improve the quality of web search.In this paper, we proposed a framework for web dataextraction by using categorized web pages to improvedata extraction accuracy and result. Firstly, thenumbers of test web pages are defined as inputs. Weuse page segmentation algorithm (VIPS) to performsegmentation these pages to achieve content structurefor web page cleaning and to evaluate informative ormain content block. These main contents arecategorized by using Support Vector Machine (SVM)which gives accurate and efficient result. Thesecategorized web pages are stored into the database(IT library) to output data accurately when user query."}]}, "item_1583103108160": {"attribute_name": "Keywords", "attribute_value_mlt": [{"interim": "VIPS"}, {"interim": "SVM"}, {"interim": "Web Page Segmentation"}, {"interim": "Categorization"}, {"interim": "Data extraction"}]}, "item_1583103120197": {"attribute_name": "Files", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_access", "date": [{"dateType": "Available", "dateValue": "2019-07-02"}], "displaytype": "preview", "download_preview_message": "", "file_order": 0, "filename": "12017.pdf", "filesize": [{"value": "893 Kb"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_free", "mimetype": "application/pdf", "size": 893000.0, "url": {"url": "https://meral.edu.mm/record/4917/files/12017.pdf"}, "version_id": "40ec590e-0f48-4475-8744-5fa0ee023d77"}]}, "item_1583103131163": {"attribute_name": "Journal articles", "attribute_value_mlt": [{"subitem_issue": "", "subitem_journal_title": "Twelfth International Conference On Computer Applications (ICCA 2014)", "subitem_pages": "", "subitem_volume": ""}]}, "item_1583103147082": {"attribute_name": "Conference papers", "attribute_value_mlt": [{"subitem_acronym": "", "subitem_c_date": "", "subitem_conference_title": "", "subitem_part": "", "subitem_place": "", "subitem_session": "", "subitem_website": ""}]}, "item_1583103211336": {"attribute_name": "Books/reports/chapters", "attribute_value_mlt": [{"subitem_book_title": "", "subitem_isbn": "", "subitem_pages": "", "subitem_place": "", "subitem_publisher": ""}]}, "item_1583103233624": {"attribute_name": "Thesis/dissertations", "attribute_value_mlt": [{"subitem_awarding_university": "", "subitem_supervisor(s)": [{"subitem_supervisor": ""}]}]}, "item_1583105942107": {"attribute_name": "Authors", "attribute_value_mlt": [{"subitem_authors": [{"subitem_authors_fullname": "Phyu, Sabai"}, {"subitem_authors_fullname": "Linn, Khaing Wah Wah"}]}]}, "item_1583108359239": {"attribute_name": "Upload type", "attribute_value_mlt": [{"interim": "Publication"}]}, "item_1583108428133": {"attribute_name": "Publication type", "attribute_value_mlt": [{"interim": "Article"}]}, "item_1583159729339": {"attribute_name": "Publication date", "attribute_value": "2014-02-17"}, "item_1583159847033": {"attribute_name": "Identifier", "attribute_value": "http://onlineresource.ucsy.edu.mm/handle/123456789/82"}, "item_title": "Web Page Categorization Based on Content and Data Extraction for Academic Community", "item_type_id": "21", "owner": "1", "path": ["1597824273898"], "permalink_uri": "http://hdl.handle.net/20.500.12678/0000004917", "pubdate": {"attribute_name": "Deposited date", "attribute_value": "2019-07-02"}, "publish_date": "2019-07-02", "publish_status": "0", "recid": "4917", "relation": {}, "relation_version_is_last": true, "title": ["Web Page Categorization Based on Content and Data Extraction for Academic Community"], "weko_shared_id": -1}
Web Page Categorization Based on Content and Data Extraction for Academic Community
http://hdl.handle.net/20.500.12678/0000004917
http://hdl.handle.net/20.500.12678/0000004917eb442643-6e0b-4933-9bf7-fc9f09c2d2b0
86d894bb-ca43-4d0f-8deb-c60cf0b79dc9
Name / File | License | Actions |
---|---|---|
![]() |
|
Publication type | ||||||
---|---|---|---|---|---|---|
Article | ||||||
Upload type | ||||||
Publication | ||||||
Title | ||||||
Title | Web Page Categorization Based on Content and Data Extraction for Academic Community | |||||
Language | en | |||||
Publication date | 2014-02-17 | |||||
Authors | ||||||
Phyu, Sabai | ||||||
Linn, Khaing Wah Wah | ||||||
Description | ||||||
The web is a large amount of data and difficult tosearch information or data of user interest (ITacademic field). Therefore, it needs to categorize formeet user’s interesting field easily. Web pagecategorization help improve the quality of web search.In this paper, we proposed a framework for web dataextraction by using categorized web pages to improvedata extraction accuracy and result. Firstly, thenumbers of test web pages are defined as inputs. Weuse page segmentation algorithm (VIPS) to performsegmentation these pages to achieve content structurefor web page cleaning and to evaluate informative ormain content block. These main contents arecategorized by using Support Vector Machine (SVM)which gives accurate and efficient result. Thesecategorized web pages are stored into the database(IT library) to output data accurately when user query. | ||||||
Keywords | ||||||
VIPS, SVM, Web Page Segmentation, Categorization, Data extraction | ||||||
Identifier | http://onlineresource.ucsy.edu.mm/handle/123456789/82 | |||||
Journal articles | ||||||
Twelfth International Conference On Computer Applications (ICCA 2014) | ||||||
Conference papers | ||||||
Books/reports/chapters | ||||||
Thesis/dissertations |