MERAL Myanmar Education Research and Learning Portal
Item
{"_buckets": {"deposit": "ac83521a-86b2-4832-8b7d-51433ce0a5b7"}, "_deposit": {"created_by": 45, "id": "6329", "owner": "45", "owners": [45], "owners_ext": {"displayname": "", "username": ""}, "pid": {"revision_id": 0, "type": "recid", "value": "6329"}, "status": "published"}, "_oai": {"id": "oai:meral.edu.mm:recid/6329", "sets": ["1605779935331", "user-uit"]}, "communities": ["uit"], "item_1583103067471": {"attribute_name": "Title", "attribute_value_mlt": [{"subitem_1551255647225": "Social Media Text Normalization", "subitem_1551255648112": "en"}]}, "item_1583103085720": {"attribute_name": "Description", "attribute_value_mlt": [{"interim": "Recent years some researchers interested in text normalization over social media, as the informal writing styles found in Twitter and other social media data. These informal texts often cause problems for Natural Language processing applications such as various mining research or translation on social media data. Today Facebook supports English translation of post and status for Myanmar Language. However, Most of the translation is not relevant for Myanmar words meaning. Complex nature of Myanmar language’s syntactic structure, informal writing style, slang words and spelling mistakes are challenge in social media text translation work. This paper proposed text normalization that can be deployed as a preprocessing step for opinion mining, machine translation and various Natural Language Processing (NLP) applications to handle social media text. There are three steps in this work: Firstly, candidate words for normalization are selected from the collected raw dataset. In this case, Out-Of-Vocabulary (OOV) words are extracted for normalization. However, not all OOV words need to be normalized. Therefore, ill-formed words are detected from OOV words list for normalization. Second, slang words dictionary is generated for this work. Third, text similarity methods are applied to ill-formed words for normalization. Evaluation will be done on translation by applying normalization in pre-processing step. For translation, Myanmar-English machine translation [14] is used. The experimental results improve by applying proposed normalization to the translation work especially for social media text."}]}, "item_1583103108160": {"attribute_name": "Keywords", "attribute_value_mlt": [{"interim": "informal text"}, {"interim": "social media"}, {"interim": "normalization"}, {"interim": "Out-Of-Vocabulary word (OOV)"}, {"interim": "translation"}]}, "item_1583103120197": {"attribute_name": "Files", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_access", "date": [{"dateType": "Available", "dateValue": "2020-11-20"}], "displaytype": "preview", "download_preview_message": "", "file_order": 0, "filename": "Social Media Text Normalization.pdf", "filesize": [{"value": "1.5 Mb"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensefree": "© 2018 ICAIT", "licensetype": "license_free", "mimetype": "application/pdf", "size": 1500000.0, "url": {"url": "https://meral.edu.mm/record/6329/files/Social Media Text Normalization.pdf"}, "version_id": "071306f1-e7e0-40d7-84ea-91fe0d88b1ef"}]}, "item_1583103147082": {"attribute_name": "Conference papers", "attribute_value_mlt": [{"subitem_acronym": "ICAIT-2018", "subitem_c_date": "1-2 November, 2018", "subitem_conference_title": "2nd International Conference on Advanced Information Technologies", "subitem_place": "Yangon, Myanmar", "subitem_session": "Natural Language Processing", "subitem_website": "https://www.uit.edu.mm/icait-2018/"}]}, "item_1583105942107": {"attribute_name": "Authors", "attribute_value_mlt": [{"subitem_authors": [{"subitem_authors_fullname": "Thet Thet Zin"}]}]}, "item_1583108359239": {"attribute_name": "Upload type", "attribute_value_mlt": [{"interim": "Publication"}]}, "item_1583108428133": {"attribute_name": "Publication type", "attribute_value_mlt": [{"interim": "Conference paper"}]}, "item_1583159729339": {"attribute_name": "Publication date", "attribute_value": "2018-11-02"}, "item_title": "Social Media Text Normalization", "item_type_id": "21", "owner": "45", "path": ["1605779935331"], "permalink_uri": "http://hdl.handle.net/20.500.12678/0000006329", "pubdate": {"attribute_name": "Deposited date", "attribute_value": "2020-11-20"}, "publish_date": "2020-11-20", "publish_status": "0", "recid": "6329", "relation": {}, "relation_version_is_last": true, "title": ["Social Media Text Normalization"], "weko_shared_id": -1}
Social Media Text Normalization
http://hdl.handle.net/20.500.12678/0000006329
http://hdl.handle.net/20.500.12678/00000063294ad574bc-f883-4aae-9915-27b1cfa1cdb3
ac83521a-86b2-4832-8b7d-51433ce0a5b7
Name / File | License | Actions |
---|---|---|
Social Media Text Normalization.pdf (1.5 Mb)
|
© 2018 ICAIT
|
Publication type | ||||||
---|---|---|---|---|---|---|
Conference paper | ||||||
Upload type | ||||||
Publication | ||||||
Title | ||||||
Title | Social Media Text Normalization | |||||
Language | en | |||||
Publication date | 2018-11-02 | |||||
Authors | ||||||
Thet Thet Zin | ||||||
Description | ||||||
Recent years some researchers interested in text normalization over social media, as the informal writing styles found in Twitter and other social media data. These informal texts often cause problems for Natural Language processing applications such as various mining research or translation on social media data. Today Facebook supports English translation of post and status for Myanmar Language. However, Most of the translation is not relevant for Myanmar words meaning. Complex nature of Myanmar language’s syntactic structure, informal writing style, slang words and spelling mistakes are challenge in social media text translation work. This paper proposed text normalization that can be deployed as a preprocessing step for opinion mining, machine translation and various Natural Language Processing (NLP) applications to handle social media text. There are three steps in this work: Firstly, candidate words for normalization are selected from the collected raw dataset. In this case, Out-Of-Vocabulary (OOV) words are extracted for normalization. However, not all OOV words need to be normalized. Therefore, ill-formed words are detected from OOV words list for normalization. Second, slang words dictionary is generated for this work. Third, text similarity methods are applied to ill-formed words for normalization. Evaluation will be done on translation by applying normalization in pre-processing step. For translation, Myanmar-English machine translation [14] is used. The experimental results improve by applying proposed normalization to the translation work especially for social media text. | ||||||
Keywords | ||||||
informal text, social media, normalization, Out-Of-Vocabulary word (OOV), translation | ||||||
Conference papers | ||||||
ICAIT-2018 | ||||||
1-2 November, 2018 | ||||||
2nd International Conference on Advanced Information Technologies | ||||||
Yangon, Myanmar | ||||||
Natural Language Processing | ||||||
https://www.uit.edu.mm/icait-2018/ |