Developing Word-aligned Myanmar-English Parallel Corpus based on the IBM Models

Khin Thandar Nwet; Khin Mar Soe; Ni Lar Thein

MERAL Myanmar Education Research and Learning Portal

lat lon distance

[[sub_check.contents]]　

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

Index Tree

Item

{"_buckets": {"deposit": "cd81f997-19ad-4f23-9398-64bd1830f546"}, "_deposit": {"created_by": 45, "id": "6751", "owner": "45", "owners": [45], "owners_ext": {"displayname": "", "username": ""}, "pid": {"revision_id": 0, "type": "depid", "value": "6751"}, "status": "published"}, "_oai": {"id": "oai:meral.edu.mm:recid/00006751", "sets": ["user-uit"]}, "communities": ["uit"], "item_1583103067471": {"attribute_name": "Title", "attribute_value_mlt": [{"subitem_1551255647225": "Developing Word-aligned Myanmar-English Parallel Corpus based on the IBM Models", "subitem_1551255648112": "en"}]}, "item_1583103085720": {"attribute_name": "Description", "attribute_value_mlt": [{"interim": "Word alignment in bilingual corpora has been an active research\ntopic in the Machine Translation research groups. Corpus is the\nbody of text collections, which are useful for Language\nProcessing (NLP). Parallel text alignment is the identification of\nthe corresponding sentences in the parallel text. Large\ncollections of parallel level are prerequisite for many areas of\nlinguistic research. Parallel corpus helps in making statistical\nbilingual dictionary, in supporting statistical machine translation\nand in supporting as training data for word sense disambiguation\nand translation disambiguation. Nowadays, the world is a global\nnetwork and everybody will be learned more than one language.\nSo, multilingual corpora are more processing. Thus, the main\npurpose of this system is to construct word-aligned parallel\ncorpus to be able in Myanmar-English machine translation. One\nuseful concept is to identify correspondences between words in\none language and in other language. The proposed approach is\nbased on the first three IBM models and EM algorithm. It also\nshows that the approach can also be improved by using a list of\ncognates and morphological analysis."}]}, "item_1583103108160": {"attribute_name": "Keywords", "attribute_value_mlt": [{"interim": "Word-aligned Parallel Corpus"}, {"interim": "IBM Models"}, {"interim": "EM Algorithm"}]}, "item_1583103120197": {"attribute_name": "Files", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_access", "date": [{"dateType": "Available", "dateValue": "2020-12-12"}], "displaytype": "preview", "download_preview_message": "", "file_order": 0, "filename": "3-Developing Word-aligned Myanmar-English Parallel Corpus based on the IBM Models.pdf", "filesize": [{"value": "231 Kb"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "mimetype": "application/pdf", "size": 231000.0, "url": {"url": "https://meral.edu.mm/api/files/cd81f997-19ad-4f23-9398-64bd1830f546/3-Developing%20Word-aligned%20Myanmar-English%20Parallel%20Corpus%20based%20on%20the%20IBM%20Models.pdf"}, "version_id": "f046b57c-f37b-4e04-8cbf-d8ebae4d26d4"}]}, "item_1583103131163": {"attribute_name": "Journal articles", "attribute_value_mlt": [{"subitem_journal_title": "Developing Word-aligned Myanmar-English Parallel Corpus based on the IBM Models", "subitem_pages": "12-18", "subitem_volume": "Volume 27– No.8"}]}, "item_1583105942107": {"attribute_name": "Authors", "attribute_value_mlt": [{"subitem_authors": [{"subitem_authors_fullname": "Khin Thandar Nwet"}, {"subitem_authors_fullname": "Khin Mar Soe"}, {"subitem_authors_fullname": "Ni Lar Thein"}]}]}, "item_1583108359239": {"attribute_name": "Upload type", "attribute_value_mlt": [{"interim": "Publication"}]}, "item_1583108428133": {"attribute_name": "Publication type", "attribute_value_mlt": [{"interim": "Journal article"}]}, "item_1583159729339": {"attribute_name": "Publication date", "attribute_value": "2011-08-01"}, "item_1583159847033": {"attribute_name": "Identifier", "attribute_value": "10.5120/3322-4566"}, "item_title": "Developing Word-aligned Myanmar-English Parallel Corpus based on the IBM Models", "item_type_id": "21", "owner": "45", "path": ["1596102355557"], "permalink_uri": "https://meral.edu.mm/records/6751", "pubdate": {"attribute_name": "Deposited date", "attribute_value": "2020-12-12"}, "publish_date": "2020-12-12", "publish_status": "0", "recid": "6751", "relation": {}, "relation_version_is_last": true, "title": ["Developing Word-aligned Myanmar-English Parallel Corpus based on the IBM Models"], "weko_shared_id": -1}

Developing Word-aligned Myanmar-English Parallel Corpus based on the IBM Models

https://meral.edu.mm/records/6751

Name / File	License	Actions
3-Developing Word-aligned Myanmar-English Parallel Corpus based on the IBM Models.pdf (231 Kb)

Publication type
		Journal article
Upload type
		Publication
Title
	Title	Developing Word-aligned Myanmar-English Parallel Corpus based on the IBM Models
	Language	en
Publication date		2011-08-01
Authors
		Khin Thandar Nwet
		Khin Mar Soe
		Ni Lar Thein
Description
		Word alignment in bilingual corpora has been an active research topic in the Machine Translation research groups. Corpus is the body of text collections, which are useful for Language Processing (NLP). Parallel text alignment is the identification of the corresponding sentences in the parallel text. Large collections of parallel level are prerequisite for many areas of linguistic research. Parallel corpus helps in making statistical bilingual dictionary, in supporting statistical machine translation and in supporting as training data for word sense disambiguation and translation disambiguation. Nowadays, the world is a global network and everybody will be learned more than one language. So, multilingual corpora are more processing. Thus, the main purpose of this system is to construct word-aligned parallel corpus to be able in Myanmar-English machine translation. One useful concept is to identify correspondences between words in one language and in other language. The proposed approach is based on the first three IBM models and EM algorithm. It also shows that the approach can also be improved by using a list of cognates and morphological analysis.
Keywords
		Word-aligned Parallel Corpus, IBM Models, EM Algorithm
Identifier		10.5120/3322-4566
Journal articles
		Developing Word-aligned Myanmar-English Parallel Corpus based on the IBM Models
		12-18
		Volume 27– No.8