MERAL Myanmar Education Research and Learning Portal
Item
{"_buckets": {"deposit": "db157c74-1a4d-45c8-b79e-766a23dc48b5"}, "_deposit": {"created_by": 45, "id": "6277", "owner": "45", "owners": [45], "owners_ext": {"displayname": "", "username": ""}, "pid": {"revision_id": 0, "type": "recid", "value": "6277"}, "status": "published"}, "_oai": {"id": "oai:meral.edu.mm:recid/6277", "sets": ["1605779935331", "user-uit"]}, "communities": ["uit"], "item_1583103067471": {"attribute_name": "Title", "attribute_value_mlt": [{"subitem_1551255647225": "Mining Web Content Outliers by using Term Weighting Technique and Rank Correlation Coefficient Approach", "subitem_1551255648112": "en"}]}, "item_1583103085720": {"attribute_name": "Description", "attribute_value_mlt": [{"interim": "In the Internet area, World Wide Web (www) involves with voluminous amount of information with more redundant and irrelevant web pages. Outliers are the data that differ significantly from the rest of data. Web content mining is a subarea under web mining that mines required and useful knowledge or information from web page content. Web content outlier mining concentrates on finding outliers such as irrelevant and redundant pages from the web pages. Webs contain unstructured and semi-structured documents, so algorithms for web content mining are needed to handle both unstructured and semi structured documents. The proposed system based on big web data. The objective of proposed system is to obtain higher accurate result. In this proposal, Term Frequency Inverse Document Frequency (TF.IDF) technique based on full word matching with domain dictionary is used to remove the irrelevant documents from the unstructured web documents based on user’s input query. Removal of outliers (irrelevant and redundant contents) from webs not only leads to reduction in indexing space and time complexity, but also improves the accuracy of search results. The documents that have very little similarity words from the user’s input query are assumed as the web outliers. And then a mathematical approach called Spearman’s rank correlation coefficient is used to remove the redundant web documents and to retrieve ranked relevant web documents."}]}, "item_1583103108160": {"attribute_name": "Keywords", "attribute_value_mlt": [{"interim": "outliers"}, {"interim": "web content mining"}, {"interim": "term frequency"}, {"interim": "correlation coefficient"}]}, "item_1583103120197": {"attribute_name": "Files", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_access", "date": [{"dateType": "Available", "dateValue": "2020-11-19"}], "displaytype": "preview", "download_preview_message": "", "file_order": 0, "filename": "Mining Web Content Outliers by using Term Weighting Technique and Rank Correlation Coefficient Approach.pdf", "filesize": [{"value": "1.3 Mb"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensefree": "© 2017 ICAIT", "licensetype": "license_free", "mimetype": "application/pdf", "size": 1300000.0, "url": {"url": "https://meral.edu.mm/record/6277/files/Mining Web Content Outliers by using Term Weighting Technique and Rank Correlation Coefficient Approach.pdf"}, "version_id": "23581f3a-5005-4f1e-931a-878b712b5416"}]}, "item_1583103147082": {"attribute_name": "Conference papers", "attribute_value_mlt": [{"subitem_acronym": "ICAIT-2017", "subitem_c_date": "1-2 November, 2017", "subitem_conference_title": "1st International Conference on Advanced Information Technologies", "subitem_place": "Yangon, Myanmar", "subitem_session": "Software Engineering and Web Mining", "subitem_website": "https://www.uit.edu.mm/icait-2017/"}]}, "item_1583105942107": {"attribute_name": "Authors", "attribute_value_mlt": [{"subitem_authors": [{"subitem_authors_fullname": "Thinzar Tun"}, {"subitem_authors_fullname": "Khin Mo Mo Tun"}]}]}, "item_1583108359239": {"attribute_name": "Upload type", "attribute_value_mlt": [{"interim": "Publication"}]}, "item_1583108428133": {"attribute_name": "Publication type", "attribute_value_mlt": [{"interim": "Conference paper"}]}, "item_1583159729339": {"attribute_name": "Publication date", "attribute_value": "2017-11-02"}, "item_title": "Mining Web Content Outliers by using Term Weighting Technique and Rank Correlation Coefficient Approach", "item_type_id": "21", "owner": "45", "path": ["1605779935331"], "permalink_uri": "http://hdl.handle.net/20.500.12678/0000006277", "pubdate": {"attribute_name": "Deposited date", "attribute_value": "2020-11-19"}, "publish_date": "2020-11-19", "publish_status": "0", "recid": "6277", "relation": {}, "relation_version_is_last": true, "title": ["Mining Web Content Outliers by using Term Weighting Technique and Rank Correlation Coefficient Approach"], "weko_shared_id": -1}
Mining Web Content Outliers by using Term Weighting Technique and Rank Correlation Coefficient Approach
http://hdl.handle.net/20.500.12678/0000006277
http://hdl.handle.net/20.500.12678/000000627725d88adb-473d-48db-9af8-9d8599ce0913
db157c74-1a4d-45c8-b79e-766a23dc48b5
Name / File | License | Actions |
---|---|---|
Mining Web Content Outliers by using Term Weighting Technique and Rank Correlation Coefficient Approach.pdf (1.3 Mb)
|
© 2017 ICAIT
|
Publication type | ||||||
---|---|---|---|---|---|---|
Conference paper | ||||||
Upload type | ||||||
Publication | ||||||
Title | ||||||
Title | Mining Web Content Outliers by using Term Weighting Technique and Rank Correlation Coefficient Approach | |||||
Language | en | |||||
Publication date | 2017-11-02 | |||||
Authors | ||||||
Thinzar Tun | ||||||
Khin Mo Mo Tun | ||||||
Description | ||||||
In the Internet area, World Wide Web (www) involves with voluminous amount of information with more redundant and irrelevant web pages. Outliers are the data that differ significantly from the rest of data. Web content mining is a subarea under web mining that mines required and useful knowledge or information from web page content. Web content outlier mining concentrates on finding outliers such as irrelevant and redundant pages from the web pages. Webs contain unstructured and semi-structured documents, so algorithms for web content mining are needed to handle both unstructured and semi structured documents. The proposed system based on big web data. The objective of proposed system is to obtain higher accurate result. In this proposal, Term Frequency Inverse Document Frequency (TF.IDF) technique based on full word matching with domain dictionary is used to remove the irrelevant documents from the unstructured web documents based on user’s input query. Removal of outliers (irrelevant and redundant contents) from webs not only leads to reduction in indexing space and time complexity, but also improves the accuracy of search results. The documents that have very little similarity words from the user’s input query are assumed as the web outliers. And then a mathematical approach called Spearman’s rank correlation coefficient is used to remove the redundant web documents and to retrieve ranked relevant web documents. | ||||||
Keywords | ||||||
outliers, web content mining, term frequency, correlation coefficient | ||||||
Conference papers | ||||||
ICAIT-2017 | ||||||
1-2 November, 2017 | ||||||
1st International Conference on Advanced Information Technologies | ||||||
Yangon, Myanmar | ||||||
Software Engineering and Web Mining | ||||||
https://www.uit.edu.mm/icait-2017/ |