{"created":"2020-08-30T13:56:28.136341+00:00","id":3117,"links":{},"metadata":{"_buckets":{"deposit":"8e653531-25ce-4ec3-944f-28a3de73ffd3"},"_deposit":{"id":"3117","owners":[],"pid":{"revision_id":0,"type":"recid","value":"3117"},"status":"published"},"_oai":{"id":"oai:meral.edu.mm:recid/3117","sets":["1582963413512:1596119372420"]},"communities":["ytu"],"item_1583103067471":{"attribute_name":"Title","attribute_value_mlt":[{"subitem_1551255647225":"Systematic Selection of Initial Centroid for K-Means Document Clustering System","subitem_1551255648112":"en"}]},"item_1583103085720":{"attribute_name":"Description","attribute_value_mlt":[{"interim":"
As the number of electronic documents generated
\nfrom worldwide source increases, it is hard to manually
\norganize, analyze and present these documents efficiently.
\nDocument clustering is one of the traditionally data mining
\ntechniques and an unsupervised learning paradigm. Fast and
\nhigh quality document clustering algorithms play an
\nimportant role in helping users to effectively navigate,
\nsummarize and organize the information. K-Means algorithm
\nis the most commonly used partitioned clustering algorithm
\nbecause it can be easily implemented and is the most efficient
\none in terms of execution times. However, the major problem
\nwith this algorithm is that it is sensitive to the selection of
\ninitial centroid and may converge to local optima. The
\nalgorithm takes the initial cluster centre arbitrarily so it does
\nnot always guarantee good clustering results. Different initial
\ncluster centres often lead to different clustering and thus
\nprovide unstable clustering results. To overcome this problem,
\nSystematic Selection of Initial Centroid for K-Means (SSIC K-
\nMeans) approach is proposed to improve the quality of
\nclustering in this paper. Unlike the traditional K-Means
\nclustering, the proposed SSIC K-Means method can generate
\nthe most compact and stable clustering results based on
\nmaximum distance initial centroids points instead of random
\ninitial centroid points. In this paper, experimental results are
\npresented in F-measures using 20 Newsgroup standard
\ndatasets. The evaluations demonstrate that the proposed
\nsolution outperforms the other initialization methods and can
\nbe applied for other various standard datasets.