Log in
Language:

MERAL Myanmar Education Research and Learning Portal

  • Top
  • Universities
  • Ranking
To
lat lon distance
To

Field does not validate



Index Link

Index Tree

Please input email address.

WEKO

One fine body…

WEKO

One fine body…

Item

{"_buckets": {"deposit": "ac83521a-86b2-4832-8b7d-51433ce0a5b7"}, "_deposit": {"created_by": 45, "id": "6329", "owner": "45", "owners": [45], "owners_ext": {"displayname": "", "username": ""}, "pid": {"revision_id": 0, "type": "recid", "value": "6329"}, "status": "published"}, "_oai": {"id": "oai:meral.edu.mm:recid/6329", "sets": ["1605779935331", "user-uit"]}, "communities": ["uit"], "item_1583103067471": {"attribute_name": "Title", "attribute_value_mlt": [{"subitem_1551255647225": "Social Media Text Normalization", "subitem_1551255648112": "en"}]}, "item_1583103085720": {"attribute_name": "Description", "attribute_value_mlt": [{"interim": "Recent years some researchers interested in text normalization over social media, as the informal writing styles found in Twitter and other social media data. These informal texts often cause problems for Natural Language processing applications such as various mining research or translation on social media data. Today Facebook supports English translation of post and status for Myanmar Language. However, Most of the translation is not relevant for Myanmar words meaning. Complex nature of Myanmar language’s syntactic structure, informal writing style, slang words and spelling mistakes are challenge in social media text translation work. This paper proposed text normalization that can be deployed as a preprocessing step for opinion mining, machine translation and various Natural Language Processing (NLP) applications to handle social media text. There are three steps in this work: Firstly, candidate words for normalization are selected from the collected raw dataset. In this case, Out-Of-Vocabulary (OOV) words are extracted for normalization. However, not all OOV words need to be normalized. Therefore, ill-formed words are detected from OOV words list for normalization. Second, slang words dictionary is generated for this work. Third, text similarity methods are applied to ill-formed words for normalization. Evaluation will be done on translation by applying normalization in pre-processing step. For translation, Myanmar-English machine translation [14] is used. The experimental results improve by applying proposed normalization to the translation work especially for social media text."}]}, "item_1583103108160": {"attribute_name": "Keywords", "attribute_value_mlt": [{"interim": "informal text"}, {"interim": "social media"}, {"interim": "normalization"}, {"interim": "Out-Of-Vocabulary word (OOV)"}, {"interim": "translation"}]}, "item_1583103120197": {"attribute_name": "Files", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_access", "date": [{"dateType": "Available", "dateValue": "2020-11-20"}], "displaytype": "preview", "download_preview_message": "", "file_order": 0, "filename": "Social Media Text Normalization.pdf", "filesize": [{"value": "1.5 Mb"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensefree": "© 2018 ICAIT", "licensetype": "license_free", "mimetype": "application/pdf", "size": 1500000.0, "url": {"url": "https://meral.edu.mm/record/6329/files/Social Media Text Normalization.pdf"}, "version_id": "071306f1-e7e0-40d7-84ea-91fe0d88b1ef"}]}, "item_1583103147082": {"attribute_name": "Conference papers", "attribute_value_mlt": [{"subitem_acronym": "ICAIT-2018", "subitem_c_date": "1-2 November, 2018", "subitem_conference_title": "2nd International Conference on Advanced Information Technologies", "subitem_place": "Yangon, Myanmar", "subitem_session": "Natural Language Processing", "subitem_website": "https://www.uit.edu.mm/icait-2018/"}]}, "item_1583105942107": {"attribute_name": "Authors", "attribute_value_mlt": [{"subitem_authors": [{"subitem_authors_fullname": "Thet Thet Zin"}]}]}, "item_1583108359239": {"attribute_name": "Upload type", "attribute_value_mlt": [{"interim": "Publication"}]}, "item_1583108428133": {"attribute_name": "Publication type", "attribute_value_mlt": [{"interim": "Conference paper"}]}, "item_1583159729339": {"attribute_name": "Publication date", "attribute_value": "2018-11-02"}, "item_title": "Social Media Text Normalization", "item_type_id": "21", "owner": "45", "path": ["1605779935331"], "permalink_uri": "http://hdl.handle.net/20.500.12678/0000006329", "pubdate": {"attribute_name": "Deposited date", "attribute_value": "2020-11-20"}, "publish_date": "2020-11-20", "publish_status": "0", "recid": "6329", "relation": {}, "relation_version_is_last": true, "title": ["Social Media Text Normalization"], "weko_shared_id": -1}
  1. University of Information Technology
  2. International Conference on Advanced Information Technologies

Social Media Text Normalization

http://hdl.handle.net/20.500.12678/0000006329
http://hdl.handle.net/20.500.12678/0000006329
4ad574bc-f883-4aae-9915-27b1cfa1cdb3
ac83521a-86b2-4832-8b7d-51433ce0a5b7
None
Preview
Name / File License Actions
Social Social Media Text Normalization.pdf (1.5 Mb)
© 2018 ICAIT
Publication type
Conference paper
Upload type
Publication
Title
Title Social Media Text Normalization
Language en
Publication date 2018-11-02
Authors
Thet Thet Zin
Description
Recent years some researchers interested in text normalization over social media, as the informal writing styles found in Twitter and other social media data. These informal texts often cause problems for Natural Language processing applications such as various mining research or translation on social media data. Today Facebook supports English translation of post and status for Myanmar Language. However, Most of the translation is not relevant for Myanmar words meaning. Complex nature of Myanmar language’s syntactic structure, informal writing style, slang words and spelling mistakes are challenge in social media text translation work. This paper proposed text normalization that can be deployed as a preprocessing step for opinion mining, machine translation and various Natural Language Processing (NLP) applications to handle social media text. There are three steps in this work: Firstly, candidate words for normalization are selected from the collected raw dataset. In this case, Out-Of-Vocabulary (OOV) words are extracted for normalization. However, not all OOV words need to be normalized. Therefore, ill-formed words are detected from OOV words list for normalization. Second, slang words dictionary is generated for this work. Third, text similarity methods are applied to ill-formed words for normalization. Evaluation will be done on translation by applying normalization in pre-processing step. For translation, Myanmar-English machine translation [14] is used. The experimental results improve by applying proposed normalization to the translation work especially for social media text.
Keywords
informal text, social media, normalization, Out-Of-Vocabulary word (OOV), translation
Conference papers
ICAIT-2018
1-2 November, 2018
2nd International Conference on Advanced Information Technologies
Yangon, Myanmar
Natural Language Processing
https://www.uit.edu.mm/icait-2018/
Back
0
0
views
downloads
See details
Views Downloads

Versions

Ver.1 2020-11-20 04:26:30.656272
Show All versions

Share

Mendeley Twitter Facebook Print Addthis

Export

OAI-PMH
  • OAI-PMH DublinCore
Other Formats
  • JSON

Confirm


Back to MERAL


Back to MERAL