Log in
Language:

MERAL Myanmar Education Research and Learning Portal

  • Top
  • Universities
  • Ranking
To
lat lon distance
To

Field does not validate



Index Link

Index Tree

Please input email address.

WEKO

One fine body…

WEKO

One fine body…

Item

{"_buckets": {"deposit": "8e653531-25ce-4ec3-944f-28a3de73ffd3"}, "_deposit": {"id": "3117", "owners": [], "pid": {"revision_id": 0, "type": "recid", "value": "3117"}, "status": "published"}, "_oai": {"id": "oai:meral.edu.mm:recid/3117", "sets": ["user-ytu"]}, "communities": ["ytu"], "item_1583103067471": {"attribute_name": "Title", "attribute_value_mlt": [{"subitem_1551255647225": "Systematic Selection of Initial Centroid for K-Means Document  Clustering System", "subitem_1551255648112": "en"}]}, "item_1583103085720": {"attribute_name": "Description", "attribute_value_mlt": [{"interim": "\u003cp\u003eAs the number of electronic documents generated\u003cbr\u003e\nfrom\u0026nbsp; worldwide\u0026nbsp; source\u0026nbsp; increases,\u0026nbsp; it\u0026nbsp; is\u0026nbsp; hard\u0026nbsp; to\u0026nbsp; manually\u003cbr\u003e\norganize,\u0026nbsp; analyze\u0026nbsp; and\u0026nbsp; present\u0026nbsp; these\u0026nbsp; documents\u0026nbsp; efficiently.\u003cbr\u003e\nDocument\u0026nbsp; clustering\u0026nbsp; is\u0026nbsp; one\u0026nbsp; of\u0026nbsp; the\u0026nbsp; traditionally\u0026nbsp; data\u0026nbsp; mining\u003cbr\u003e\ntechniques and an unsupervised learning paradigm. Fast and\u003cbr\u003e\nhigh\u0026nbsp; quality\u0026nbsp; document\u0026nbsp; clustering\u0026nbsp; algorithms\u0026nbsp; play\u0026nbsp; an\u003cbr\u003e\nimportant\u0026nbsp; role\u0026nbsp; in\u0026nbsp; helping\u0026nbsp; users\u0026nbsp; to\u0026nbsp; effectively\u0026nbsp; navigate,\u003cbr\u003e\nsummarize and organize the information. K-Means algorithm\u003cbr\u003e\nis\u0026nbsp; the\u0026nbsp; most\u0026nbsp; commonly\u0026nbsp; used\u0026nbsp; partitioned\u0026nbsp; clustering\u0026nbsp; algorithm\u003cbr\u003e\nbecause it can be easily implemented and is the most efficient\u003cbr\u003e\none in terms of execution times. However, the major problem\u003cbr\u003e\nwith\u0026nbsp; this\u0026nbsp; algorithm\u0026nbsp; is\u0026nbsp; that\u0026nbsp; it\u0026nbsp; is\u0026nbsp; sensitive\u0026nbsp; to\u0026nbsp; the\u0026nbsp; selection\u0026nbsp; of\u003cbr\u003e\ninitial\u0026nbsp; centroid\u0026nbsp; and\u0026nbsp; may\u0026nbsp; converge\u0026nbsp; to\u0026nbsp; local\u0026nbsp; optima.\u0026nbsp; The\u003cbr\u003e\nalgorithm takes the initial cluster centre arbitrarily so it does\u003cbr\u003e\nnot always guarantee good clustering results. Different initial\u003cbr\u003e\ncluster\u0026nbsp; centres\u0026nbsp; often\u0026nbsp; lead\u0026nbsp; to\u0026nbsp; different\u0026nbsp; clustering\u0026nbsp; and\u0026nbsp; thus\u003cbr\u003e\nprovide unstable clustering results. To overcome this problem,\u0026nbsp; \u0026nbsp;\u003cbr\u003e\nSystematic Selection of Initial Centroid for K-Means (SSIC K-\u003cbr\u003e\nMeans)\u0026nbsp; approach\u0026nbsp; is\u0026nbsp; proposed\u0026nbsp; to\u0026nbsp; improve\u0026nbsp; the\u0026nbsp; quality\u0026nbsp; of\u003cbr\u003e\nclustering\u0026nbsp; in\u0026nbsp; this\u0026nbsp; paper.\u0026nbsp; Unlike\u0026nbsp; the\u0026nbsp; traditional\u0026nbsp; K-Means\u003cbr\u003e\nclustering, the proposed SSIC K-Means method can generate\u003cbr\u003e\nthe\u0026nbsp; most\u0026nbsp; compact\u0026nbsp; and\u0026nbsp; stable\u0026nbsp; clustering\u0026nbsp; results\u0026nbsp; based\u0026nbsp; on\u003cbr\u003e\nmaximum distance initial centroids points instead of random\u003cbr\u003e\ninitial centroid points. In this paper, experimental results are\u003cbr\u003e\npresented\u0026nbsp; in\u0026nbsp; F-measures\u0026nbsp; using\u0026nbsp; 20\u0026nbsp; Newsgroup\u0026nbsp; standard\u003cbr\u003e\ndatasets.\u0026nbsp; The\u0026nbsp; evaluations\u0026nbsp; demonstrate\u0026nbsp; that\u0026nbsp; the\u0026nbsp; proposed\u003cbr\u003e\nsolution outperforms the other initialization methods and can\u003cbr\u003e\nbe applied for other various standard datasets.\u003c/p\u003e"}]}, "item_1583103108160": {"attribute_name": "Keywords", "attribute_value_mlt": [{"interim": "Document clustering"}, {"interim": "Data mining"}, {"interim": "K-Means"}, {"interim": "Initial centroid"}, {"interim": "SSIC K-Means"}]}, "item_1583103120197": {"attribute_name": "Files", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_access", "date": [{"dateType": "Available", "dateValue": "2019-07-04"}], "displaytype": "preview", "download_preview_message": "", "file_order": 0, "filename": "Systematic Selection of Initial Centroid for K-Means Document Clustering System.pdf", "filesize": [{"value": "251 Kb"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "mimetype": "application/pdf", "size": 251000.0, "url": {"url": "https://meral.edu.mm/record/3117/files/Systematic Selection of Initial Centroid for K-Means Document Clustering System.pdf"}, "version_id": "aa0b4c3e-0c8d-4e66-925c-d5edd590e0ad"}]}, "item_1583103131163": {"attribute_name": "Journal articles", "attribute_value_mlt": [{"subitem_issue": "", "subitem_journal_title": "", "subitem_pages": "", "subitem_volume": ""}]}, "item_1583103147082": {"attribute_name": "Conference papers", "attribute_value_mlt": [{"subitem_acronym": "", "subitem_c_date": "", "subitem_conference_title": "", "subitem_part": "", "subitem_place": "", "subitem_session": "", "subitem_website": ""}]}, "item_1583103211336": {"attribute_name": "Books/reports/chapters", "attribute_value_mlt": [{"subitem_book_title": "", "subitem_isbn": "", "subitem_pages": "", "subitem_place": "", "subitem_publisher": ""}]}, "item_1583103233624": {"attribute_name": "Thesis/dissertations", "attribute_value_mlt": [{"subitem_awarding_university": "", "subitem_supervisor(s)": [{"subitem_supervisor": ""}]}]}, "item_1583105942107": {"attribute_name": "Authors", "attribute_value_mlt": [{"subitem_authors": [{"subitem_authors_fullname": "Tin Thu Zar Win"}, {"subitem_authors_fullname": "Moe Moe Aye"}]}]}, "item_1583108359239": {"attribute_name": "Upload type", "attribute_value_mlt": [{"interim": "Publication"}]}, "item_1583108428133": {"attribute_name": "Publication type", "attribute_value_mlt": [{"interim": "Conference paper"}]}, "item_1583159729339": {"attribute_name": "Publication date", "attribute_value": "2016-12-29"}, "item_1583159847033": {"attribute_name": "Identifier", "attribute_value": "10.5281/zenodo.3268434"}, "item_title": "Systematic Selection of Initial Centroid for K-Means Document  Clustering System", "item_type_id": "21", "owner": "1", "path": ["1596119372420"], "permalink_uri": "http://hdl.handle.net/20.500.12678/0000003117", "pubdate": {"attribute_name": "Deposited date", "attribute_value": "2019-07-04"}, "publish_date": "2019-07-04", "publish_status": "0", "recid": "3117", "relation": {}, "relation_version_is_last": true, "title": ["Systematic Selection of Initial Centroid for K-Means Document  Clustering System"], "weko_shared_id": -1}
  1. Yangon Technological University
  2. Department of Computer Engineering and Information Technology

Systematic Selection of Initial Centroid for K-Means Document Clustering System

http://hdl.handle.net/20.500.12678/0000003117
http://hdl.handle.net/20.500.12678/0000003117
5c0a3b3f-edce-4ce8-bde8-70558cdb74d0
8e653531-25ce-4ec3-944f-28a3de73ffd3
None
Preview
Name / File License Actions
Systematic Systematic Selection of Initial Centroid for K-Means Document Clustering System.pdf (251 Kb)
Publication type
Conference paper
Upload type
Publication
Title
Title Systematic Selection of Initial Centroid for K-Means Document Clustering System
Language en
Publication date 2016-12-29
Authors
Tin Thu Zar Win
Moe Moe Aye
Description
<p>As the number of electronic documents generated<br>
from&nbsp; worldwide&nbsp; source&nbsp; increases,&nbsp; it&nbsp; is&nbsp; hard&nbsp; to&nbsp; manually<br>
organize,&nbsp; analyze&nbsp; and&nbsp; present&nbsp; these&nbsp; documents&nbsp; efficiently.<br>
Document&nbsp; clustering&nbsp; is&nbsp; one&nbsp; of&nbsp; the&nbsp; traditionally&nbsp; data&nbsp; mining<br>
techniques and an unsupervised learning paradigm. Fast and<br>
high&nbsp; quality&nbsp; document&nbsp; clustering&nbsp; algorithms&nbsp; play&nbsp; an<br>
important&nbsp; role&nbsp; in&nbsp; helping&nbsp; users&nbsp; to&nbsp; effectively&nbsp; navigate,<br>
summarize and organize the information. K-Means algorithm<br>
is&nbsp; the&nbsp; most&nbsp; commonly&nbsp; used&nbsp; partitioned&nbsp; clustering&nbsp; algorithm<br>
because it can be easily implemented and is the most efficient<br>
one in terms of execution times. However, the major problem<br>
with&nbsp; this&nbsp; algorithm&nbsp; is&nbsp; that&nbsp; it&nbsp; is&nbsp; sensitive&nbsp; to&nbsp; the&nbsp; selection&nbsp; of<br>
initial&nbsp; centroid&nbsp; and&nbsp; may&nbsp; converge&nbsp; to&nbsp; local&nbsp; optima.&nbsp; The<br>
algorithm takes the initial cluster centre arbitrarily so it does<br>
not always guarantee good clustering results. Different initial<br>
cluster&nbsp; centres&nbsp; often&nbsp; lead&nbsp; to&nbsp; different&nbsp; clustering&nbsp; and&nbsp; thus<br>
provide unstable clustering results. To overcome this problem,&nbsp; &nbsp;<br>
Systematic Selection of Initial Centroid for K-Means (SSIC K-<br>
Means)&nbsp; approach&nbsp; is&nbsp; proposed&nbsp; to&nbsp; improve&nbsp; the&nbsp; quality&nbsp; of<br>
clustering&nbsp; in&nbsp; this&nbsp; paper.&nbsp; Unlike&nbsp; the&nbsp; traditional&nbsp; K-Means<br>
clustering, the proposed SSIC K-Means method can generate<br>
the&nbsp; most&nbsp; compact&nbsp; and&nbsp; stable&nbsp; clustering&nbsp; results&nbsp; based&nbsp; on<br>
maximum distance initial centroids points instead of random<br>
initial centroid points. In this paper, experimental results are<br>
presented&nbsp; in&nbsp; F-measures&nbsp; using&nbsp; 20&nbsp; Newsgroup&nbsp; standard<br>
datasets.&nbsp; The&nbsp; evaluations&nbsp; demonstrate&nbsp; that&nbsp; the&nbsp; proposed<br>
solution outperforms the other initialization methods and can<br>
be applied for other various standard datasets.</p>
Keywords
Document clustering, Data mining, K-Means, Initial centroid, SSIC K-Means
Identifier 10.5281/zenodo.3268434
Journal articles
Conference papers
Books/reports/chapters
Thesis/dissertations
Back
0
0
views
downloads
See details
Views Downloads

Versions

Ver.1 2020-08-30 13:56:31.293414
Show All versions

Share

Mendeley Twitter Facebook Print Addthis

Export

OAI-PMH
  • OAI-PMH DublinCore
Other Formats
  • JSON

Confirm


Back to MERAL


Back to MERAL