Clustering XML Document Based On Path Similarities Using Structure Only

Mon, Ei Ei; Tun, Khin Nwe Ni

MERAL Myanmar Education Research and Learning Portal

lat lon distance

[[sub_check.contents]]　

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

Index Tree

Item

{"_buckets": {"deposit": "e5426361-a2e8-4074-9bef-76520aee87a3"}, "_deposit": {"id": "4226", "owners": [], "pid": {"revision_id": 0, "type": "recid", "value": "4226"}, "status": "published"}, "_oai": {"id": "oai:meral.edu.mm:recid/4226", "sets": ["user-ucsy"]}, "communities": ["ucsy"], "item_1583103067471": {"attribute_name": "Title", "attribute_value_mlt": [{"subitem_1551255647225": "Clustering XML Document Based On Path Similarities Using Structure Only", "subitem_1551255648112": "en"}]}, "item_1583103085720": {"attribute_name": "Description", "attribute_value_mlt": [{"interim": "We propose a methodology for clustering XMLdocuments on the basis of their structuralsimilarities. This research combines the methods ofcommon XPath and K-means clustering that improvethe efficiency for those XML documents with manydifferent structures. The common XPath is used forsearching similarities between huge numbers of XMLdocuments’ paths. K-means clustering algorithm isessentially used to accurate clusters. In order tocluster the documents’ paths we indicate the steps bystep methods. The first step includes frequentstructure mining for searching similarities betweenthe huge amounts of XML documents’ structures byusing the F-P growth method. The second step buildsdimensional feature vector matrix by using extractedpaths. Based on the set of common path vectorscollected, we compute the structure similaritybetween the XML documents. And the last steputilizes the K-means clustering algorithm is used tocreate accurate clusters which are based on the ideaof using path based clustering, which groups thedocuments according to their common XPaths, i.e.their frequent structures. The quality of clusteringcan be measured on the dissimilarity of documentstructures. Also, experimental evaluation performedon both synthetic and real data shows theeffectiveness of our approach."}]}, "item_1583103108160": {"attribute_name": "Keywords", "attribute_value_mlt": [{"interim": "common XPath"}, {"interim": "K-means clustering"}, {"interim": "XML Document Clustering"}, {"interim": "Data Mining"}, {"interim": "Frequent Structure Mining"}]}, "item_1583103120197": {"attribute_name": "Files", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_access", "date": [{"dateType": "Available", "dateValue": "2019-08-06"}], "displaytype": "preview", "download_preview_message": "", "file_order": 0, "filename": "59027.pdf", "filesize": [{"value": "100 Kb"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_free", "mimetype": "application/pdf", "size": 100000.0, "url": {"url": "https://meral.edu.mm/record/4226/files/59027.pdf"}, "version_id": "dd00b9b2-e941-46b6-a01d-00a691fa2028"}]}, "item_1583103131163": {"attribute_name": "Journal articles", "attribute_value_mlt": [{"subitem_issue": "", "subitem_journal_title": "Fourth Local Conference on Parallel and Soft Computing", "subitem_pages": "", "subitem_volume": ""}]}, "item_1583103147082": {"attribute_name": "Conference papers", "attribute_value_mlt": [{"subitem_acronym": "", "subitem_c_date": "", "subitem_conference_title": "", "subitem_part": "", "subitem_place": "", "subitem_session": "", "subitem_website": ""}]}, "item_1583103211336": {"attribute_name": "Books/reports/chapters", "attribute_value_mlt": [{"subitem_book_title": "", "subitem_isbn": "", "subitem_pages": "", "subitem_place": "", "subitem_publisher": ""}]}, "item_1583103233624": {"attribute_name": "Thesis/dissertations", "attribute_value_mlt": [{"subitem_awarding_university": "", "subitem_supervisor(s)": [{"subitem_supervisor": ""}]}]}, "item_1583105942107": {"attribute_name": "Authors", "attribute_value_mlt": [{"subitem_authors": [{"subitem_authors_fullname": "Mon, Ei Ei"}, {"subitem_authors_fullname": "Tun, Khin Nwe Ni"}]}]}, "item_1583108359239": {"attribute_name": "Upload type", "attribute_value_mlt": [{"interim": "Publication"}]}, "item_1583108428133": {"attribute_name": "Publication type", "attribute_value_mlt": [{"interim": "Article"}]}, "item_1583159729339": {"attribute_name": "Publication date", "attribute_value": "2009-12-30"}, "item_1583159847033": {"attribute_name": "Identifier", "attribute_value": "http://onlineresource.ucsy.edu.mm/handle/123456789/1918"}, "item_title": "Clustering XML Document Based On Path Similarities Using Structure Only", "item_type_id": "21", "owner": "1", "path": ["1597824273898"], "permalink_uri": "http://hdl.handle.net/20.500.12678/0000004226", "pubdate": {"attribute_name": "Deposited date", "attribute_value": "2019-08-06"}, "publish_date": "2019-08-06", "publish_status": "0", "recid": "4226", "relation": {}, "relation_version_is_last": true, "title": ["Clustering XML Document Based On Path Similarities Using Structure Only"], "weko_shared_id": -1}

Clustering XML Document Based On Path Similarities Using Structure Only

http://hdl.handle.net/20.500.12678/0000004226

Preview

Name / File	License	Actions
59027.pdf (100 Kb)

Publication type
		Article
Upload type
		Publication
Title
	Title	Clustering XML Document Based On Path Similarities Using Structure Only
	Language	en
Publication date		2009-12-30
Authors
		Mon, Ei Ei
		Tun, Khin Nwe Ni
Description
		We propose a methodology for clustering XMLdocuments on the basis of their structuralsimilarities. This research combines the methods ofcommon XPath and K-means clustering that improvethe efficiency for those XML documents with manydifferent structures. The common XPath is used forsearching similarities between huge numbers of XMLdocuments’ paths. K-means clustering algorithm isessentially used to accurate clusters. In order tocluster the documents’ paths we indicate the steps bystep methods. The first step includes frequentstructure mining for searching similarities betweenthe huge amounts of XML documents’ structures byusing the F-P growth method. The second step buildsdimensional feature vector matrix by using extractedpaths. Based on the set of common path vectorscollected, we compute the structure similaritybetween the XML documents. And the last steputilizes the K-means clustering algorithm is used tocreate accurate clusters which are based on the ideaof using path based clustering, which groups thedocuments according to their common XPaths, i.e.their frequent structures. The quality of clusteringcan be measured on the dissimilarity of documentstructures. Also, experimental evaluation performedon both synthetic and real data shows theeffectiveness of our approach.
Keywords
		common XPath, K-means clustering, XML Document Clustering, Data Mining, Frequent Structure Mining
Identifier		http://onlineresource.ucsy.edu.mm/handle/123456789/1918
Journal articles
		Fourth Local Conference on Parallel and Soft Computing
Conference papers
Books/reports/chapters
Thesis/dissertations