{"created":"2020-08-30T13:52:28.683292+00:00","id":3090,"links":{},"metadata":{"_buckets":{"deposit":"3244effb-6332-4069-929a-354f052f6228"},"_deposit":{"id":"3090","owners":[],"pid":{"revision_id":0,"type":"recid","value":"3090"},"status":"published"},"_oai":{"id":"oai:meral.edu.mm:recid/3090","sets":["1582963413512:1596119372420"]},"communities":["ytu"],"item_1583103067471":{"attribute_name":"Title","attribute_value_mlt":[{"subitem_1551255647225":"Myanmar OOV Words Extraction with Maximal Substrings and its Application to Document Clustering","subitem_1551255648112":"en"}]},"item_1583103085720":{"attribute_name":"Description","attribute_value_mlt":[{"interim":"

This paper proposes a method for out-of-vocabulary (OOV) words extraction from Myanmar text with maximal substrings. Our method aims to extract OOV words that can be added into the Myanmar dictionary. The outcome of our method are new compound words which are not described in the Myanmar dictionary. Our method, firstly, extracts maximal substrings from Myanmar news articles. Maximal substrings are defined as the substrings whose numbers of occurrences are reduced by any of its extensions. Secondly, we make a postprocessing of maximal substrings, because the resulting maximal substrings contain noisy characters. In our post-processing, we reduce the number of maximal substrings and remove maximal substrings whose prefixes and suffixes are meaningless characters. We keep only the substrings that consist of words from the existing dictionary. As a result, we obtain the substrings as candidates of new compound words that can be added into the existing Myanmar dictionary. We perform the evaluation both from the subjective and quantitative perspectives. From the subjective perspective, we compare the new compound words extracted by our method with those extracted by word bigrams method. It is found that our method is better than the word bigrams method based on the evaluation using a pooling procedure. From the quantitative perspective, we use the extracted compound words as additional features in K-means clustering. The experimental results show that the document clusters given by our method are better than those given by word bigrams method in precision, recall and Fscore.

"}]},"item_1583103108160":{"attribute_name":"Keywords","attribute_value_mlt":[{"interim":"OOV Words"},{"interim":"Maximal Substrings"},{"interim":"Compound Words"},{"interim":"K-means Clustering"},{"interim":"Document Clustering"}]},"item_1583103120197":{"attribute_name":"Files","attribute_type":"file","attribute_value_mlt":[{"accessrole":"open_access","date":[{"dateType":"Available","dateValue":"2019-04-01"}],"displaytype":"preview","filename":"Myanmar OOV Words Extraction with Maximal Substrings and its Application to Document Clustering.pdf","filesize":[{"value":"8807 Kb"}],"format":"application/pdf","mimetype":"application/pdf","url":{"url":"https://meral.edu.mm/record/3090/files/Myanmar OOV Words Extraction with Maximal Substrings and its Application to Document Clustering.pdf"},"version_id":"df92c135-5bbb-4f3a-aa77-666c0a14f44a"}]},"item_1583103131163":{"attribute_name":"Journal articles","attribute_value_mlt":[{"subitem_issue":"","subitem_journal_title":"","subitem_pages":"","subitem_volume":""}]},"item_1583103147082":{"attribute_name":"Conference papers","attribute_value_mlt":[{"subitem_acronym":"PACLING 2017","subitem_c_date":"16-18 August 2017","subitem_conference_title":"15th International Conference of the Pacific Association for Computational Linguistics","subitem_part":"","subitem_place":"Yangon, Myanmar","subitem_session":"","subitem_website":""}]},"item_1583103211336":{"attribute_name":"Books/reports/chapters","attribute_value_mlt":[{"subitem_book_title":"","subitem_isbn":"","subitem_pages":"","subitem_place":"","subitem_publisher":""}]},"item_1583103233624":{"attribute_name":"Thesis/dissertations","attribute_value_mlt":[{"subitem_awarding_university":"","subitem_supervisor(s)":[{"subitem_supervisor":""}]}]},"item_1583105942107":{"attribute_name":"Authors","attribute_value_mlt":[{"subitem_authors":[{"subitem_authors_fullname":"Yuzana Win"},{"subitem_authors_fullname":"Tomonari Masada"}]}]},"item_1583108359239":{"attribute_name":"Upload type","attribute_value_mlt":[{"interim":"Poster"}]},"item_1583108428133":{"attribute_name":"Publication type","attribute_value_mlt":[{"interim":"Poster"}]},"item_1583159729339":{"attribute_name":"Publication date","attribute_value":"2017-08-17"},"item_1583159847033":{"attribute_name":"Identifier","attribute_value":"10.5281/zenodo.2619886"},"item_title":"Myanmar OOV Words Extraction with Maximal Substrings and its Application to Document Clustering","item_type_id":"21","owner":"1","path":["1596119372420"],"publish_date":"2019-04-01","publish_status":"0","recid":"3090","relation_version_is_last":true,"title":["Myanmar OOV Words Extraction with Maximal Substrings and its Application to Document Clustering"],"weko_creator_id":"1","weko_shared_id":-1},"updated":"2021-12-13T05:45:43.110797+00:00"}