Log in
Language:

MERAL Myanmar Education Research and Learning Portal

  • Top
  • Universities
  • Ranking
To
lat lon distance
To

Field does not validate



Index Link

Index Tree

Please input email address.

WEKO

One fine body…

WEKO

One fine body…

Item

{"_buckets": {"deposit": "3244effb-6332-4069-929a-354f052f6228"}, "_deposit": {"id": "3090", "owners": [], "pid": {"revision_id": 0, "type": "recid", "value": "3090"}, "status": "published"}, "_oai": {"id": "oai:meral.edu.mm:recid/3090", "sets": ["user-ytu"]}, "communities": ["ytu"], "item_1583103067471": {"attribute_name": "Title", "attribute_value_mlt": [{"subitem_1551255647225": "Myanmar OOV Words Extraction with Maximal Substrings and its Application to Document Clustering", "subitem_1551255648112": "en"}]}, "item_1583103085720": {"attribute_name": "Description", "attribute_value_mlt": [{"interim": "\u003cp\u003eThis paper proposes a method for out-of-vocabulary (OOV) words extraction from Myanmar text with maximal substrings. Our method aims to extract OOV words that can be added into the Myanmar dictionary. The outcome of our method are new compound words which are not described in the Myanmar dictionary. Our method, \u003cem\u003efirstly\u003c/em\u003e, extracts maximal substrings from Myanmar news articles. Maximal substrings are defined as the substrings whose numbers of occurrences are reduced by any of its extensions. \u003cem\u003eSecondly\u003c/em\u003e, we make a postprocessing of maximal substrings, because the resulting maximal substrings contain noisy characters. In our post-processing, we reduce the number of maximal substrings and remove maximal substrings whose prefixes and suffixes are meaningless characters. We keep only the substrings that consist of words from the existing dictionary. As a result, we obtain the substrings as candidates of new compound words that can be added into the existing Myanmar dictionary. We perform the evaluation both from the subjective and quantitative perspectives. From the subjective perspective, we compare the new compound words extracted by our method with those extracted by word bigrams method. It is found that our method is better than the word bigrams method based on the evaluation using a pooling procedure. From the quantitative perspective, we use the extracted compound words as additional features in K-means clustering. The experimental results show that the document clusters given by our method are better than those given by word bigrams method in precision, recall and Fscore.\u003c/p\u003e"}]}, "item_1583103108160": {"attribute_name": "Keywords", "attribute_value_mlt": [{"interim": "OOV Words"}, {"interim": "Maximal Substrings"}, {"interim": "Compound Words"}, {"interim": "K-means Clustering"}, {"interim": "Document Clustering"}]}, "item_1583103120197": {"attribute_name": "Files", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_access", "date": [{"dateType": "Available", "dateValue": "2019-04-01"}], "displaytype": "preview", "download_preview_message": "", "file_order": 0, "filename": "Myanmar OOV Words Extraction with Maximal Substrings and its Application to Document Clustering.pdf", "filesize": [{"value": "8807 Kb"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "mimetype": "application/pdf", "size": 8807000.0, "url": {"url": "https://meral.edu.mm/record/3090/files/Myanmar OOV Words Extraction with Maximal Substrings and its Application to Document Clustering.pdf"}, "version_id": "df92c135-5bbb-4f3a-aa77-666c0a14f44a"}]}, "item_1583103131163": {"attribute_name": "Journal articles", "attribute_value_mlt": [{"subitem_issue": "", "subitem_journal_title": "", "subitem_pages": "", "subitem_volume": ""}]}, "item_1583103147082": {"attribute_name": "Conference papers", "attribute_value_mlt": [{"subitem_acronym": "PACLING 2017", "subitem_c_date": "16-18 August 2017", "subitem_conference_title": "15th International Conference of the Pacific Association for Computational Linguistics", "subitem_part": "", "subitem_place": "Yangon, Myanmar", "subitem_session": "", "subitem_website": ""}]}, "item_1583103211336": {"attribute_name": "Books/reports/chapters", "attribute_value_mlt": [{"subitem_book_title": "", "subitem_isbn": "", "subitem_pages": "", "subitem_place": "", "subitem_publisher": ""}]}, "item_1583103233624": {"attribute_name": "Thesis/dissertations", "attribute_value_mlt": [{"subitem_awarding_university": "", "subitem_supervisor(s)": [{"subitem_supervisor": ""}]}]}, "item_1583105942107": {"attribute_name": "Authors", "attribute_value_mlt": [{"subitem_authors": [{"subitem_authors_fullname": "Yuzana Win"}, {"subitem_authors_fullname": "Tomonari Masada"}]}]}, "item_1583108359239": {"attribute_name": "Upload type", "attribute_value_mlt": [{"interim": "Poster"}]}, "item_1583108428133": {"attribute_name": "Publication type", "attribute_value_mlt": [{"interim": "Poster"}]}, "item_1583159729339": {"attribute_name": "Publication date", "attribute_value": "2017-08-17"}, "item_1583159847033": {"attribute_name": "Identifier", "attribute_value": "10.5281/zenodo.2619886"}, "item_title": "Myanmar OOV Words Extraction with Maximal Substrings and its Application to Document Clustering", "item_type_id": "21", "owner": "1", "path": ["1596119372420"], "permalink_uri": "http://hdl.handle.net/20.500.12678/0000003090", "pubdate": {"attribute_name": "Deposited date", "attribute_value": "2019-04-01"}, "publish_date": "2019-04-01", "publish_status": "0", "recid": "3090", "relation": {}, "relation_version_is_last": true, "title": ["Myanmar OOV Words Extraction with Maximal Substrings and its Application to Document Clustering"], "weko_shared_id": -1}
  1. Yangon Technological University
  2. Department of Computer Engineering and Information Technology

Myanmar OOV Words Extraction with Maximal Substrings and its Application to Document Clustering

http://hdl.handle.net/20.500.12678/0000003090
http://hdl.handle.net/20.500.12678/0000003090
f89eaae4-840d-4401-bf95-8cb6fbcd4ace
3244effb-6332-4069-929a-354f052f6228
None
Preview
Name / File License Actions
Myanmar Myanmar OOV Words Extraction with Maximal Substrings and its Application to Document Clustering.pdf (8807 Kb)
Publication type
Poster
Upload type
Poster
Title
Title Myanmar OOV Words Extraction with Maximal Substrings and its Application to Document Clustering
Language en
Publication date 2017-08-17
Authors
Yuzana Win
Tomonari Masada
Description
<p>This paper proposes a method for out-of-vocabulary (OOV) words extraction from Myanmar text with maximal substrings. Our method aims to extract OOV words that can be added into the Myanmar dictionary. The outcome of our method are new compound words which are not described in the Myanmar dictionary. Our method, <em>firstly</em>, extracts maximal substrings from Myanmar news articles. Maximal substrings are defined as the substrings whose numbers of occurrences are reduced by any of its extensions. <em>Secondly</em>, we make a postprocessing of maximal substrings, because the resulting maximal substrings contain noisy characters. In our post-processing, we reduce the number of maximal substrings and remove maximal substrings whose prefixes and suffixes are meaningless characters. We keep only the substrings that consist of words from the existing dictionary. As a result, we obtain the substrings as candidates of new compound words that can be added into the existing Myanmar dictionary. We perform the evaluation both from the subjective and quantitative perspectives. From the subjective perspective, we compare the new compound words extracted by our method with those extracted by word bigrams method. It is found that our method is better than the word bigrams method based on the evaluation using a pooling procedure. From the quantitative perspective, we use the extracted compound words as additional features in K-means clustering. The experimental results show that the document clusters given by our method are better than those given by word bigrams method in precision, recall and Fscore.</p>
Keywords
OOV Words, Maximal Substrings, Compound Words, K-means Clustering, Document Clustering
Identifier 10.5281/zenodo.2619886
Journal articles
Conference papers
PACLING 2017
16-18 August 2017
15th International Conference of the Pacific Association for Computational Linguistics
Yangon, Myanmar
Books/reports/chapters
Thesis/dissertations
Back
0
0
views
downloads
See details
Views Downloads

Versions

Ver.1 2020-08-30 13:52:29.749201
Show All versions

Share

Mendeley Twitter Facebook Print Addthis

Export

OAI-PMH
  • OAI-PMH DublinCore
Other Formats
  • JSON

Confirm


Back to MERAL


Back to MERAL