-
RootNode
-
Co-operative College, Mandalay
-
Cooperative College, Phaunggyi
-
Co-operative University, Sagaing
-
Co-operative University, Thanlyin
-
Dagon University
-
Kyaukse University
-
Laquarware Technological college
-
Mandalay Technological University
-
Mandalay University of Distance Education
-
Mandalay University of Foreign Languages
-
Maubin University
-
Mawlamyine University
-
Meiktila University
-
Mohnyin University
-
Myanmar Institute of Information Technology
-
Myanmar Maritime University
-
National Management Degree College
-
Naypyitaw State Academy
-
Pathein University
-
Sagaing University
-
Sagaing University of Education
-
Taunggyi University
-
Technological University, Hmawbi
-
Technological University (Kyaukse)
-
Technological University Mandalay
-
University of Computer Studies, Mandalay
-
University of Computer Studies Maubin
-
University of Computer Studies, Meikhtila
-
University of Computer Studies Pathein
-
University of Computer Studies, Taungoo
-
University of Computer Studies, Yangon
-
University of Dental Medicine Mandalay
-
University of Dental Medicine, Yangon
-
University of Information Technology
-
University of Mandalay
-
University of Medicine 1
-
University of Medicine 2
-
University of Medicine Mandalay
-
University of Myitkyina
-
University of Public Health, Yangon
-
University of Veterinary Science
-
University of Yangon
-
West Yangon University
-
Yadanabon University
-
Yangon Technological University
-
Yangon University of Distance Education
-
Yangon University of Economics
-
Yangon University of Education
-
Yangon University of Foreign Languages
-
Yezin Agricultural University
-
New Index
-
Item
{"_buckets": {"deposit": "db157c74-1a4d-45c8-b79e-766a23dc48b5"}, "_deposit": {"created_by": 45, "id": "6277", "owner": "45", "owners": [45], "owners_ext": {"displayname": "", "username": ""}, "pid": {"revision_id": 0, "type": "recid", "value": "6277"}, "status": "published"}, "_oai": {"id": "oai:meral.edu.mm:recid/6277", "sets": ["1605779935331", "user-uit"]}, "communities": ["uit"], "item_1583103067471": {"attribute_name": "Title", "attribute_value_mlt": [{"subitem_1551255647225": "Mining Web Content Outliers by using Term Weighting Technique and Rank Correlation Coefficient Approach", "subitem_1551255648112": "en"}]}, "item_1583103085720": {"attribute_name": "Description", "attribute_value_mlt": [{"interim": "In the Internet area, World Wide Web (www) involves with voluminous amount of information with more redundant and irrelevant web pages. Outliers are the data that differ significantly from the rest of data. Web content mining is a subarea under web mining that mines required and useful knowledge or information from web page content. Web content outlier mining concentrates on finding outliers such as irrelevant and redundant pages from the web pages. Webs contain unstructured and semi-structured documents, so algorithms for web content mining are needed to handle both unstructured and semi structured documents. The proposed system based on big web data. The objective of proposed system is to obtain higher accurate result. In this proposal, Term Frequency Inverse Document Frequency (TF.IDF) technique based on full word matching with domain dictionary is used to remove the irrelevant documents from the unstructured web documents based on user’s input query. Removal of outliers (irrelevant and redundant contents) from webs not only leads to reduction in indexing space and time complexity, but also improves the accuracy of search results. The documents that have very little similarity words from the user’s input query are assumed as the web outliers. And then a mathematical approach called Spearman’s rank correlation coefficient is used to remove the redundant web documents and to retrieve ranked relevant web documents."}]}, "item_1583103108160": {"attribute_name": "Keywords", "attribute_value_mlt": [{"interim": "outliers"}, {"interim": "web content mining"}, {"interim": "term frequency"}, {"interim": "correlation coefficient"}]}, "item_1583103120197": {"attribute_name": "Files", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_access", "date": [{"dateType": "Available", "dateValue": "2020-11-19"}], "displaytype": "preview", "download_preview_message": "", "file_order": 0, "filename": "Mining Web Content Outliers by using Term Weighting Technique and Rank Correlation Coefficient Approach.pdf", "filesize": [{"value": "1.3 Mb"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensefree": "© 2017 ICAIT", "licensetype": "license_free", "mimetype": "application/pdf", "size": 1300000.0, "url": {"url": "https://meral.edu.mm/record/6277/files/Mining Web Content Outliers by using Term Weighting Technique and Rank Correlation Coefficient Approach.pdf"}, "version_id": "23581f3a-5005-4f1e-931a-878b712b5416"}]}, "item_1583103147082": {"attribute_name": "Conference papers", "attribute_value_mlt": [{"subitem_acronym": "ICAIT-2017", "subitem_c_date": "1-2 November, 2017", "subitem_conference_title": "1st International Conference on Advanced Information Technologies", "subitem_place": "Yangon, Myanmar", "subitem_session": "Software Engineering and Web Mining", "subitem_website": "https://www.uit.edu.mm/icait-2017/"}]}, "item_1583105942107": {"attribute_name": "Authors", "attribute_value_mlt": [{"subitem_authors": [{"subitem_authors_fullname": "Thinzar Tun"}, {"subitem_authors_fullname": "Khin Mo Mo Tun"}]}]}, "item_1583108359239": {"attribute_name": "Upload type", "attribute_value_mlt": [{"interim": "Publication"}]}, "item_1583108428133": {"attribute_name": "Publication type", "attribute_value_mlt": [{"interim": "Conference paper"}]}, "item_1583159729339": {"attribute_name": "Publication date", "attribute_value": "2017-11-02"}, "item_title": "Mining Web Content Outliers by using Term Weighting Technique and Rank Correlation Coefficient Approach", "item_type_id": "21", "owner": "45", "path": ["1605779935331"], "permalink_uri": "http://hdl.handle.net/20.500.12678/0000006277", "pubdate": {"attribute_name": "Deposited date", "attribute_value": "2020-11-19"}, "publish_date": "2020-11-19", "publish_status": "0", "recid": "6277", "relation": {}, "relation_version_is_last": true, "title": ["Mining Web Content Outliers by using Term Weighting Technique and Rank Correlation Coefficient Approach"], "weko_shared_id": -1}
Mining Web Content Outliers by using Term Weighting Technique and Rank Correlation Coefficient Approach
http://hdl.handle.net/20.500.12678/0000006277
http://hdl.handle.net/20.500.12678/000000627725d88adb-473d-48db-9af8-9d8599ce0913
db157c74-1a4d-45c8-b79e-766a23dc48b5
Name / File | License | Actions |
---|---|---|
![]() |
© 2017 ICAIT
|
Publication type | ||||||
---|---|---|---|---|---|---|
Conference paper | ||||||
Upload type | ||||||
Publication | ||||||
Title | ||||||
Title | Mining Web Content Outliers by using Term Weighting Technique and Rank Correlation Coefficient Approach | |||||
Language | en | |||||
Publication date | 2017-11-02 | |||||
Authors | ||||||
Thinzar Tun | ||||||
Khin Mo Mo Tun | ||||||
Description | ||||||
In the Internet area, World Wide Web (www) involves with voluminous amount of information with more redundant and irrelevant web pages. Outliers are the data that differ significantly from the rest of data. Web content mining is a subarea under web mining that mines required and useful knowledge or information from web page content. Web content outlier mining concentrates on finding outliers such as irrelevant and redundant pages from the web pages. Webs contain unstructured and semi-structured documents, so algorithms for web content mining are needed to handle both unstructured and semi structured documents. The proposed system based on big web data. The objective of proposed system is to obtain higher accurate result. In this proposal, Term Frequency Inverse Document Frequency (TF.IDF) technique based on full word matching with domain dictionary is used to remove the irrelevant documents from the unstructured web documents based on user’s input query. Removal of outliers (irrelevant and redundant contents) from webs not only leads to reduction in indexing space and time complexity, but also improves the accuracy of search results. The documents that have very little similarity words from the user’s input query are assumed as the web outliers. And then a mathematical approach called Spearman’s rank correlation coefficient is used to remove the redundant web documents and to retrieve ranked relevant web documents. | ||||||
Keywords | ||||||
outliers, web content mining, term frequency, correlation coefficient | ||||||
Conference papers | ||||||
ICAIT-2017 | ||||||
1-2 November, 2017 | ||||||
1st International Conference on Advanced Information Technologies | ||||||
Yangon, Myanmar | ||||||
Software Engineering and Web Mining | ||||||
https://www.uit.edu.mm/icait-2017/ |