MERAL Myanmar Education Research and Learning Portal
Item
{"_buckets": {"deposit": "edfac9d5-4969-4586-b895-1b90125ffebc"}, "_deposit": {"id": "4966", "owners": [], "pid": {"revision_id": 0, "type": "recid", "value": "4966"}, "status": "published"}, "_oai": {"id": "oai:meral.edu.mm:recid/4966", "sets": ["user-ucsy"]}, "communities": ["ucsy"], "item_1583103067471": {"attribute_name": "Title", "attribute_value_mlt": [{"subitem_1551255647225": "Font Script Identification Based on N-gram Text Categorization", "subitem_1551255648112": "en"}]}, "item_1583103085720": {"attribute_name": "Description", "attribute_value_mlt": [{"interim": "In this paper, we propose a method for identifyingfont scripts of Myanmar Language. Because of theunavailability of nationwide standardized encodingscheme in Myanmar font scripts, knowledge writtenin Myanmar language are scattered across internetpages. Font scripts Identifier are essential to mergethose scattered knowledge into one for NLPapplication such as text categorization, informationretrieval and text summarization. Our proposedmethod use N-gram based text categorization. Apiece of text for 11 font scripts is taken for training.TF-IDF (Term Frequency-Inverse DocumentFrequency) weights of character N-grams for eachfont script are computed and stored as a profile forthat particular font script. When a new text documentis given to testify, TF-IDF weight is computed forthat font script and cosine similarity is measuredbetween the test and trained profiles. The highestsimilarity scored of the font script is taken as aresult. 100% accuracy is obtained for testing of11different font scripts by applying TF-IDFapproach. Therefore, this method works well forMyanmar font script identification."}]}, "item_1583103108160": {"attribute_name": "Keywords", "attribute_value_mlt": [{"interim": "Font"}, {"interim": "Font Script"}, {"interim": "Language Identification"}, {"interim": "Font Script Identification"}, {"interim": "N-gram"}, {"interim": "Text Categorization"}, {"interim": "TF-IDF Weights"}]}, "item_1583103120197": {"attribute_name": "Files", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_access", "date": [{"dateType": "Available", "dateValue": "2019-07-12"}], "displaytype": "preview", "download_preview_message": "", "file_order": 0, "filename": "psc2010paper (136).pdf", "filesize": [{"value": "152 Kb"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_free", "mimetype": "application/pdf", "size": 152000.0, "url": {"url": "https://meral.edu.mm/record/4966/files/psc2010paper (136).pdf"}, "version_id": "643677c9-7676-49dc-93c2-7965b1b40fb6"}]}, "item_1583103131163": {"attribute_name": "Journal articles", "attribute_value_mlt": [{"subitem_issue": "", "subitem_journal_title": "Fifth Local Conference on Parallel and Soft Computing", "subitem_pages": "", "subitem_volume": ""}]}, "item_1583103147082": {"attribute_name": "Conference papers", "attribute_value_mlt": [{"subitem_acronym": "", "subitem_c_date": "", "subitem_conference_title": "", "subitem_part": "", "subitem_place": "", "subitem_session": "", "subitem_website": ""}]}, "item_1583103211336": {"attribute_name": "Books/reports/chapters", "attribute_value_mlt": [{"subitem_book_title": "", "subitem_isbn": "", "subitem_pages": "", "subitem_place": "", "subitem_publisher": ""}]}, "item_1583103233624": {"attribute_name": "Thesis/dissertations", "attribute_value_mlt": [{"subitem_awarding_university": "", "subitem_supervisor(s)": [{"subitem_supervisor": ""}]}]}, "item_1583105942107": {"attribute_name": "Authors", "attribute_value_mlt": [{"subitem_authors": [{"subitem_authors_fullname": "Than, Kyaw Myo"}, {"subitem_authors_fullname": "Htay, Hla Hla"}]}]}, "item_1583108359239": {"attribute_name": "Upload type", "attribute_value_mlt": [{"interim": "Publication"}]}, "item_1583108428133": {"attribute_name": "Publication type", "attribute_value_mlt": [{"interim": "Article"}]}, "item_1583159729339": {"attribute_name": "Publication date", "attribute_value": "2010-12-16"}, "item_1583159847033": {"attribute_name": "Identifier", "attribute_value": "http://onlineresource.ucsy.edu.mm/handle/123456789/865"}, "item_title": "Font Script Identification Based on N-gram Text Categorization", "item_type_id": "21", "owner": "1", "path": ["1597824273898"], "permalink_uri": "http://hdl.handle.net/20.500.12678/0000004966", "pubdate": {"attribute_name": "Deposited date", "attribute_value": "2019-07-12"}, "publish_date": "2019-07-12", "publish_status": "0", "recid": "4966", "relation": {}, "relation_version_is_last": true, "title": ["Font Script Identification Based on N-gram Text Categorization"], "weko_shared_id": -1}
Font Script Identification Based on N-gram Text Categorization
http://hdl.handle.net/20.500.12678/0000004966
http://hdl.handle.net/20.500.12678/000000496694bfc436-3690-42b0-a973-02557b8bd86e
edfac9d5-4969-4586-b895-1b90125ffebc
Name / File | License | Actions |
---|---|---|
psc2010paper (136).pdf (152 Kb)
|
|
Publication type | ||||||
---|---|---|---|---|---|---|
Article | ||||||
Upload type | ||||||
Publication | ||||||
Title | ||||||
Title | Font Script Identification Based on N-gram Text Categorization | |||||
Language | en | |||||
Publication date | 2010-12-16 | |||||
Authors | ||||||
Than, Kyaw Myo | ||||||
Htay, Hla Hla | ||||||
Description | ||||||
In this paper, we propose a method for identifyingfont scripts of Myanmar Language. Because of theunavailability of nationwide standardized encodingscheme in Myanmar font scripts, knowledge writtenin Myanmar language are scattered across internetpages. Font scripts Identifier are essential to mergethose scattered knowledge into one for NLPapplication such as text categorization, informationretrieval and text summarization. Our proposedmethod use N-gram based text categorization. Apiece of text for 11 font scripts is taken for training.TF-IDF (Term Frequency-Inverse DocumentFrequency) weights of character N-grams for eachfont script are computed and stored as a profile forthat particular font script. When a new text documentis given to testify, TF-IDF weight is computed forthat font script and cosine similarity is measuredbetween the test and trained profiles. The highestsimilarity scored of the font script is taken as aresult. 100% accuracy is obtained for testing of11different font scripts by applying TF-IDFapproach. Therefore, this method works well forMyanmar font script identification. | ||||||
Keywords | ||||||
Font, Font Script, Language Identification, Font Script Identification, N-gram, Text Categorization, TF-IDF Weights | ||||||
Identifier | http://onlineresource.ucsy.edu.mm/handle/123456789/865 | |||||
Journal articles | ||||||
Fifth Local Conference on Parallel and Soft Computing | ||||||
Conference papers | ||||||
Books/reports/chapters | ||||||
Thesis/dissertations |