MERAL Myanmar Education Research and Learning Portal
Item
{"_buckets": {"deposit": "8e653531-25ce-4ec3-944f-28a3de73ffd3"}, "_deposit": {"id": "3117", "owners": [], "pid": {"revision_id": 0, "type": "recid", "value": "3117"}, "status": "published"}, "_oai": {"id": "oai:meral.edu.mm:recid/3117", "sets": ["user-ytu"]}, "communities": ["ytu"], "item_1583103067471": {"attribute_name": "Title", "attribute_value_mlt": [{"subitem_1551255647225": "Systematic Selection of Initial Centroid for K-Means Document Clustering System", "subitem_1551255648112": "en"}]}, "item_1583103085720": {"attribute_name": "Description", "attribute_value_mlt": [{"interim": "\u003cp\u003eAs the number of electronic documents generated\u003cbr\u003e\nfrom\u0026nbsp; worldwide\u0026nbsp; source\u0026nbsp; increases,\u0026nbsp; it\u0026nbsp; is\u0026nbsp; hard\u0026nbsp; to\u0026nbsp; manually\u003cbr\u003e\norganize,\u0026nbsp; analyze\u0026nbsp; and\u0026nbsp; present\u0026nbsp; these\u0026nbsp; documents\u0026nbsp; efficiently.\u003cbr\u003e\nDocument\u0026nbsp; clustering\u0026nbsp; is\u0026nbsp; one\u0026nbsp; of\u0026nbsp; the\u0026nbsp; traditionally\u0026nbsp; data\u0026nbsp; mining\u003cbr\u003e\ntechniques and an unsupervised learning paradigm. Fast and\u003cbr\u003e\nhigh\u0026nbsp; quality\u0026nbsp; document\u0026nbsp; clustering\u0026nbsp; algorithms\u0026nbsp; play\u0026nbsp; an\u003cbr\u003e\nimportant\u0026nbsp; role\u0026nbsp; in\u0026nbsp; helping\u0026nbsp; users\u0026nbsp; to\u0026nbsp; effectively\u0026nbsp; navigate,\u003cbr\u003e\nsummarize and organize the information. K-Means algorithm\u003cbr\u003e\nis\u0026nbsp; the\u0026nbsp; most\u0026nbsp; commonly\u0026nbsp; used\u0026nbsp; partitioned\u0026nbsp; clustering\u0026nbsp; algorithm\u003cbr\u003e\nbecause it can be easily implemented and is the most efficient\u003cbr\u003e\none in terms of execution times. However, the major problem\u003cbr\u003e\nwith\u0026nbsp; this\u0026nbsp; algorithm\u0026nbsp; is\u0026nbsp; that\u0026nbsp; it\u0026nbsp; is\u0026nbsp; sensitive\u0026nbsp; to\u0026nbsp; the\u0026nbsp; selection\u0026nbsp; of\u003cbr\u003e\ninitial\u0026nbsp; centroid\u0026nbsp; and\u0026nbsp; may\u0026nbsp; converge\u0026nbsp; to\u0026nbsp; local\u0026nbsp; optima.\u0026nbsp; The\u003cbr\u003e\nalgorithm takes the initial cluster centre arbitrarily so it does\u003cbr\u003e\nnot always guarantee good clustering results. Different initial\u003cbr\u003e\ncluster\u0026nbsp; centres\u0026nbsp; often\u0026nbsp; lead\u0026nbsp; to\u0026nbsp; different\u0026nbsp; clustering\u0026nbsp; and\u0026nbsp; thus\u003cbr\u003e\nprovide unstable clustering results. To overcome this problem,\u0026nbsp; \u0026nbsp;\u003cbr\u003e\nSystematic Selection of Initial Centroid for K-Means (SSIC K-\u003cbr\u003e\nMeans)\u0026nbsp; approach\u0026nbsp; is\u0026nbsp; proposed\u0026nbsp; to\u0026nbsp; improve\u0026nbsp; the\u0026nbsp; quality\u0026nbsp; of\u003cbr\u003e\nclustering\u0026nbsp; in\u0026nbsp; this\u0026nbsp; paper.\u0026nbsp; Unlike\u0026nbsp; the\u0026nbsp; traditional\u0026nbsp; K-Means\u003cbr\u003e\nclustering, the proposed SSIC K-Means method can generate\u003cbr\u003e\nthe\u0026nbsp; most\u0026nbsp; compact\u0026nbsp; and\u0026nbsp; stable\u0026nbsp; clustering\u0026nbsp; results\u0026nbsp; based\u0026nbsp; on\u003cbr\u003e\nmaximum distance initial centroids points instead of random\u003cbr\u003e\ninitial centroid points. In this paper, experimental results are\u003cbr\u003e\npresented\u0026nbsp; in\u0026nbsp; F-measures\u0026nbsp; using\u0026nbsp; 20\u0026nbsp; Newsgroup\u0026nbsp; standard\u003cbr\u003e\ndatasets.\u0026nbsp; The\u0026nbsp; evaluations\u0026nbsp; demonstrate\u0026nbsp; that\u0026nbsp; the\u0026nbsp; proposed\u003cbr\u003e\nsolution outperforms the other initialization methods and can\u003cbr\u003e\nbe applied for other various standard datasets.\u003c/p\u003e"}]}, "item_1583103108160": {"attribute_name": "Keywords", "attribute_value_mlt": [{"interim": "Document clustering"}, {"interim": "Data mining"}, {"interim": "K-Means"}, {"interim": "Initial centroid"}, {"interim": "SSIC K-Means"}]}, "item_1583103120197": {"attribute_name": "Files", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_access", "date": [{"dateType": "Available", "dateValue": "2019-07-04"}], "displaytype": "preview", "download_preview_message": "", "file_order": 0, "filename": "Systematic Selection of Initial Centroid for K-Means Document Clustering System.pdf", "filesize": [{"value": "251 Kb"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "mimetype": "application/pdf", "size": 251000.0, "url": {"url": "https://meral.edu.mm/record/3117/files/Systematic Selection of Initial Centroid for K-Means Document Clustering System.pdf"}, "version_id": "aa0b4c3e-0c8d-4e66-925c-d5edd590e0ad"}]}, "item_1583103131163": {"attribute_name": "Journal articles", "attribute_value_mlt": [{"subitem_issue": "", "subitem_journal_title": "", "subitem_pages": "", "subitem_volume": ""}]}, "item_1583103147082": {"attribute_name": "Conference papers", "attribute_value_mlt": [{"subitem_acronym": "", "subitem_c_date": "", "subitem_conference_title": "", "subitem_part": "", "subitem_place": "", "subitem_session": "", "subitem_website": ""}]}, "item_1583103211336": {"attribute_name": "Books/reports/chapters", "attribute_value_mlt": [{"subitem_book_title": "", "subitem_isbn": "", "subitem_pages": "", "subitem_place": "", "subitem_publisher": ""}]}, "item_1583103233624": {"attribute_name": "Thesis/dissertations", "attribute_value_mlt": [{"subitem_awarding_university": "", "subitem_supervisor(s)": [{"subitem_supervisor": ""}]}]}, "item_1583105942107": {"attribute_name": "Authors", "attribute_value_mlt": [{"subitem_authors": [{"subitem_authors_fullname": "Tin Thu Zar Win"}, {"subitem_authors_fullname": "Moe Moe Aye"}]}]}, "item_1583108359239": {"attribute_name": "Upload type", "attribute_value_mlt": [{"interim": "Publication"}]}, "item_1583108428133": {"attribute_name": "Publication type", "attribute_value_mlt": [{"interim": "Conference paper"}]}, "item_1583159729339": {"attribute_name": "Publication date", "attribute_value": "2016-12-29"}, "item_1583159847033": {"attribute_name": "Identifier", "attribute_value": "10.5281/zenodo.3268434"}, "item_title": "Systematic Selection of Initial Centroid for K-Means Document Clustering System", "item_type_id": "21", "owner": "1", "path": ["1596119372420"], "permalink_uri": "http://hdl.handle.net/20.500.12678/0000003117", "pubdate": {"attribute_name": "Deposited date", "attribute_value": "2019-07-04"}, "publish_date": "2019-07-04", "publish_status": "0", "recid": "3117", "relation": {}, "relation_version_is_last": true, "title": ["Systematic Selection of Initial Centroid for K-Means Document Clustering System"], "weko_shared_id": -1}
Systematic Selection of Initial Centroid for K-Means Document Clustering System
http://hdl.handle.net/20.500.12678/0000003117
http://hdl.handle.net/20.500.12678/00000031175c0a3b3f-edce-4ce8-bde8-70558cdb74d0
8e653531-25ce-4ec3-944f-28a3de73ffd3
Name / File | License | Actions |
---|---|---|
Systematic Selection of Initial Centroid for K-Means Document Clustering System.pdf (251 Kb)
|
|
Publication type | ||||||
---|---|---|---|---|---|---|
Conference paper | ||||||
Upload type | ||||||
Publication | ||||||
Title | ||||||
Title | Systematic Selection of Initial Centroid for K-Means Document Clustering System | |||||
Language | en | |||||
Publication date | 2016-12-29 | |||||
Authors | ||||||
Tin Thu Zar Win | ||||||
Moe Moe Aye | ||||||
Description | ||||||
<p>As the number of electronic documents generated<br> from worldwide source increases, it is hard to manually<br> organize, analyze and present these documents efficiently.<br> Document clustering is one of the traditionally data mining<br> techniques and an unsupervised learning paradigm. Fast and<br> high quality document clustering algorithms play an<br> important role in helping users to effectively navigate,<br> summarize and organize the information. K-Means algorithm<br> is the most commonly used partitioned clustering algorithm<br> because it can be easily implemented and is the most efficient<br> one in terms of execution times. However, the major problem<br> with this algorithm is that it is sensitive to the selection of<br> initial centroid and may converge to local optima. The<br> algorithm takes the initial cluster centre arbitrarily so it does<br> not always guarantee good clustering results. Different initial<br> cluster centres often lead to different clustering and thus<br> provide unstable clustering results. To overcome this problem, <br> Systematic Selection of Initial Centroid for K-Means (SSIC K-<br> Means) approach is proposed to improve the quality of<br> clustering in this paper. Unlike the traditional K-Means<br> clustering, the proposed SSIC K-Means method can generate<br> the most compact and stable clustering results based on<br> maximum distance initial centroids points instead of random<br> initial centroid points. In this paper, experimental results are<br> presented in F-measures using 20 Newsgroup standard<br> datasets. The evaluations demonstrate that the proposed<br> solution outperforms the other initialization methods and can<br> be applied for other various standard datasets.</p> |
||||||
Keywords | ||||||
Document clustering, Data mining, K-Means, Initial centroid, SSIC K-Means | ||||||
Identifier | 10.5281/zenodo.3268434 | |||||
Journal articles | ||||||
Conference papers | ||||||
Books/reports/chapters | ||||||
Thesis/dissertations |