Log in
Language:

MERAL Myanmar Education Research and Learning Portal

  • Top
  • Universities
  • Ranking
To
lat lon distance
To

Field does not validate



Index Link

Index Tree

Please input email address.

WEKO

One fine body…

WEKO

One fine body…

Item

{"_buckets": {"deposit": "22497a5b-ea21-4e1f-8cca-a257c7e05643"}, "_deposit": {"id": "4576", "owners": [], "pid": {"revision_id": 0, "type": "recid", "value": "4576"}, "status": "published"}, "_oai": {"id": "oai:meral.edu.mm:recid/4576", "sets": ["user-ucsy"]}, "communities": ["ucsy"], "item_1583103067471": {"attribute_name": "Title", "attribute_value_mlt": [{"subitem_1551255647225": "Building Large Scale Text Corpus for Joint Word Segmentation and Part-of-Speech Tagging of Myanmar Language", "subitem_1551255648112": "en_US"}]}, "item_1583103085720": {"attribute_name": "Description", "attribute_value_mlt": [{"interim": "In Natural Language Processing (NLP), Word segmentation and Part-of-Speech (POS) taggingare fundamental tasks. The POS information is also necessary in NLP’s preprocessing work applications suchas machine translation (MT), information retrieval (IR), etc. Currently, there are many research efforts inword segmentation and POS tagging developed separately with different methods to get high performanceand accuracy. Word segmentation and Part-of-speech tagging is one of the important actions in languageprocessing. Against this, while numerous models are provided in different languages, few works have beenperformed for Myanmar language. This paper describes the building of Myanmar Corpus to use for jointword segmentation and part-of-speech tagging of Myanmar Language. In our research, the corpus contains51207 sentences and 839161words. The corpus is created using 12 tags. To evaluate the accuracy of thecorpus, HMM model is trained on different data size and testing is done with closed test and opened test.Results with 94% accuracy in the experiments show the appropriate efficiency of the built corpus."}]}, "item_1583103108160": {"attribute_name": "Keywords", "attribute_value_mlt": [{"interim": "Natural Language Processing"}, {"interim": "POS"}, {"interim": "HMM"}, {"interim": "Corpus"}]}, "item_1583103120197": {"attribute_name": "Files", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_access", "date": [{"dateType": "Available", "dateValue": "2020-03-12"}], "displaytype": "preview", "download_preview_message": "", "file_order": 0, "filename": "Dim Lam Cing.pdf", "filesize": [{"value": "177 Kb"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_free", "mimetype": "application/pdf", "size": 177000.0, "url": {"url": "https://meral.edu.mm/record/4576/files/Dim Lam Cing.pdf"}, "version_id": "15eebf66-99e1-4b00-9f85-53f3d3041fb8"}]}, "item_1583103131163": {"attribute_name": "Journal articles", "attribute_value_mlt": [{"subitem_issue": "", "subitem_journal_title": "Proceedings of the 10th International Workshop on Computer Science and Engineering", "subitem_pages": "", "subitem_volume": ""}]}, "item_1583103147082": {"attribute_name": "Conference papers", "attribute_value_mlt": [{"subitem_acronym": "", "subitem_c_date": "", "subitem_conference_title": "", "subitem_part": "", "subitem_place": "", "subitem_session": "", "subitem_website": ""}]}, "item_1583103211336": {"attribute_name": "Books/reports/chapters", "attribute_value_mlt": [{"subitem_book_title": "", "subitem_isbn": "", "subitem_pages": "", "subitem_place": "", "subitem_publisher": ""}]}, "item_1583103233624": {"attribute_name": "Thesis/dissertations", "attribute_value_mlt": [{"subitem_awarding_university": "", "subitem_supervisor(s)": [{"subitem_supervisor": ""}]}]}, "item_1583105942107": {"attribute_name": "Authors", "attribute_value_mlt": [{"subitem_authors": [{"subitem_authors_fullname": "Dim Lam, Cing"}, {"subitem_authors_fullname": "Soe, Khin Mar"}]}]}, "item_1583108359239": {"attribute_name": "Upload type", "attribute_value_mlt": [{"interim": "Publication"}]}, "item_1583108428133": {"attribute_name": "Publication type", "attribute_value_mlt": [{"interim": "Article"}]}, "item_1583159729339": {"attribute_name": "Publication date", "attribute_value": "2020-02-28"}, "item_1583159847033": {"attribute_name": "Identifier", "attribute_value": "978-981-14-4787-7"}, "item_title": "Building Large Scale Text Corpus for Joint Word Segmentation and Part-of-Speech Tagging of Myanmar Language", "item_type_id": "21", "owner": "1", "path": ["1597824175385"], "permalink_uri": "http://hdl.handle.net/20.500.12678/0000004576", "pubdate": {"attribute_name": "Deposited date", "attribute_value": "2020-03-12"}, "publish_date": "2020-03-12", "publish_status": "0", "recid": "4576", "relation": {}, "relation_version_is_last": true, "title": ["Building Large Scale Text Corpus for Joint Word Segmentation and Part-of-Speech Tagging of Myanmar Language"], "weko_shared_id": -1}
  1. University of Computer Studies, Yangon
  2. Faculty of Computer Science

Building Large Scale Text Corpus for Joint Word Segmentation and Part-of-Speech Tagging of Myanmar Language

http://hdl.handle.net/20.500.12678/0000004576
http://hdl.handle.net/20.500.12678/0000004576
a70b949f-9c07-4d5e-a959-ce2b282c2c86
22497a5b-ea21-4e1f-8cca-a257c7e05643
None
Preview
Name / File License Actions
Dim Dim Lam Cing.pdf (177 Kb)
Publication type
Article
Upload type
Publication
Title
Title Building Large Scale Text Corpus for Joint Word Segmentation and Part-of-Speech Tagging of Myanmar Language
Language en_US
Publication date 2020-02-28
Authors
Dim Lam, Cing
Soe, Khin Mar
Description
In Natural Language Processing (NLP), Word segmentation and Part-of-Speech (POS) taggingare fundamental tasks. The POS information is also necessary in NLP’s preprocessing work applications suchas machine translation (MT), information retrieval (IR), etc. Currently, there are many research efforts inword segmentation and POS tagging developed separately with different methods to get high performanceand accuracy. Word segmentation and Part-of-speech tagging is one of the important actions in languageprocessing. Against this, while numerous models are provided in different languages, few works have beenperformed for Myanmar language. This paper describes the building of Myanmar Corpus to use for jointword segmentation and part-of-speech tagging of Myanmar Language. In our research, the corpus contains51207 sentences and 839161words. The corpus is created using 12 tags. To evaluate the accuracy of thecorpus, HMM model is trained on different data size and testing is done with closed test and opened test.Results with 94% accuracy in the experiments show the appropriate efficiency of the built corpus.
Keywords
Natural Language Processing, POS, HMM, Corpus
Identifier 978-981-14-4787-7
Journal articles
Proceedings of the 10th International Workshop on Computer Science and Engineering
Conference papers
Books/reports/chapters
Thesis/dissertations
Back
0
0
views
downloads
See details
Views Downloads

Versions

Ver.1 2020-09-01 15:07:06.631759
Show All versions

Share

Mendeley Twitter Facebook Print Addthis

Export

OAI-PMH
  • OAI-PMH DublinCore
Other Formats
  • JSON

Confirm


Back to MERAL


Back to MERAL