Font Script Identification Based on N-gram Text Categorization

Than, Kyaw Myo; Htay, Hla Hla

Index Tree

RootNode
- Co-operative College, Mandalay
- Cooperative College, Phaunggyi
- Co-operative University, Sagaing
- Co-operative University, Thanlyin
- Dagon University
- Kyaukse University
- Laquarware Technological college
- Mandalay Technological University
- Mandalay University of Distance Education
- Mandalay University of Foreign Languages
- Maubin University
- Mawlamyine University
- Meiktila University
- Mohnyin University
- Myanmar Institute of Information Technology
- Myanmar Maritime University
- National Management Degree College
- Naypyitaw State Academy
- Pathein University
- Sagaing University
- Sagaing University of Education
- Taunggyi University
- Technological University, Hmawbi
- Technological University (Kyaukse)
- Technological University Mandalay
- University of Computer Studies, Mandalay
- University of Computer Studies Maubin
- University of Computer Studies, Meikhtila
- University of Computer Studies Pathein
- University of Computer Studies, Taungoo
- University of Computer Studies, Yangon
- University of Dental Medicine Mandalay
- University of Dental Medicine, Yangon
- University of Information Technology
- University of Mandalay
- University of Medicine 1
- University of Medicine 2
- University of Medicine Mandalay
- University of Myitkyina
- University of Public Health, Yangon
- University of Veterinary Science
- University of Yangon
- West Yangon University
- Yadanabon University
- Yangon Technological University
- Yangon University of Distance Education
- Yangon University of Economics
- Yangon University of Education
- Yangon University of Foreign Languages
- Yezin Agricultural University
- New Index

Item

{"_buckets": {"deposit": "edfac9d5-4969-4586-b895-1b90125ffebc"}, "_deposit": {"id": "4966", "owners": [], "pid": {"revision_id": 0, "type": "recid", "value": "4966"}, "status": "published"}, "_oai": {"id": "oai:meral.edu.mm:recid/4966", "sets": ["user-ucsy"]}, "communities": ["ucsy"], "item_1583103067471": {"attribute_name": "Title", "attribute_value_mlt": [{"subitem_1551255647225": "Font Script Identification Based on N-gram Text Categorization", "subitem_1551255648112": "en"}]}, "item_1583103085720": {"attribute_name": "Description", "attribute_value_mlt": [{"interim": "In this paper, we propose a method for identifyingfont scripts of Myanmar Language. Because of theunavailability of nationwide standardized encodingscheme in Myanmar font scripts, knowledge writtenin Myanmar language are scattered across internetpages. Font scripts Identifier are essential to mergethose scattered knowledge into one for NLPapplication such as text categorization, informationretrieval and text summarization. Our proposedmethod use N-gram based text categorization. Apiece of text for 11 font scripts is taken for training.TF-IDF (Term Frequency-Inverse DocumentFrequency) weights of character N-grams for eachfont script are computed and stored as a profile forthat particular font script. When a new text documentis given to testify, TF-IDF weight is computed forthat font script and cosine similarity is measuredbetween the test and trained profiles. The highestsimilarity scored of the font script is taken as aresult. 100% accuracy is obtained for testing of11different font scripts by applying TF-IDFapproach. Therefore, this method works well forMyanmar font script identification."}]}, "item_1583103108160": {"attribute_name": "Keywords", "attribute_value_mlt": [{"interim": "Font"}, {"interim": "Font Script"}, {"interim": "Language Identification"}, {"interim": "Font Script Identification"}, {"interim": "N-gram"}, {"interim": "Text Categorization"}, {"interim": "TF-IDF Weights"}]}, "item_1583103120197": {"attribute_name": "Files", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_access", "date": [{"dateType": "Available", "dateValue": "2019-07-12"}], "displaytype": "preview", "download_preview_message": "", "file_order": 0, "filename": "psc2010paper (136).pdf", "filesize": [{"value": "152 Kb"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_free", "mimetype": "application/pdf", "size": 152000.0, "url": {"url": "https://meral.edu.mm/record/4966/files/psc2010paper (136).pdf"}, "version_id": "643677c9-7676-49dc-93c2-7965b1b40fb6"}]}, "item_1583103131163": {"attribute_name": "Journal articles", "attribute_value_mlt": [{"subitem_issue": "", "subitem_journal_title": "Fifth Local Conference on Parallel and Soft Computing", "subitem_pages": "", "subitem_volume": ""}]}, "item_1583103147082": {"attribute_name": "Conference papers", "attribute_value_mlt": [{"subitem_acronym": "", "subitem_c_date": "", "subitem_conference_title": "", "subitem_part": "", "subitem_place": "", "subitem_session": "", "subitem_website": ""}]}, "item_1583103211336": {"attribute_name": "Books/reports/chapters", "attribute_value_mlt": [{"subitem_book_title": "", "subitem_isbn": "", "subitem_pages": "", "subitem_place": "", "subitem_publisher": ""}]}, "item_1583103233624": {"attribute_name": "Thesis/dissertations", "attribute_value_mlt": [{"subitem_awarding_university": "", "subitem_supervisor(s)": [{"subitem_supervisor": ""}]}]}, "item_1583105942107": {"attribute_name": "Authors", "attribute_value_mlt": [{"subitem_authors": [{"subitem_authors_fullname": "Than, Kyaw Myo"}, {"subitem_authors_fullname": "Htay, Hla Hla"}]}]}, "item_1583108359239": {"attribute_name": "Upload type", "attribute_value_mlt": [{"interim": "Publication"}]}, "item_1583108428133": {"attribute_name": "Publication type", "attribute_value_mlt": [{"interim": "Article"}]}, "item_1583159729339": {"attribute_name": "Publication date", "attribute_value": "2010-12-16"}, "item_1583159847033": {"attribute_name": "Identifier", "attribute_value": "http://onlineresource.ucsy.edu.mm/handle/123456789/865"}, "item_title": "Font Script Identification Based on N-gram Text Categorization", "item_type_id": "21", "owner": "1", "path": ["1597824273898"], "permalink_uri": "http://hdl.handle.net/20.500.12678/0000004966", "pubdate": {"attribute_name": "Deposited date", "attribute_value": "2019-07-12"}, "publish_date": "2019-07-12", "publish_status": "0", "recid": "4966", "relation": {}, "relation_version_is_last": true, "title": ["Font Script Identification Based on N-gram Text Categorization"], "weko_shared_id": -1}

Font Script Identification Based on N-gram Text Categorization

http://hdl.handle.net/20.500.12678/0000004966

Preview

Name / File	License	Actions
psc2010paper (136).pdf (152 Kb)

Publication type
		Article
Upload type
		Publication
Title
	Title	Font Script Identification Based on N-gram Text Categorization
	Language	en
Publication date		2010-12-16
Authors
		Than, Kyaw Myo
		Htay, Hla Hla
Description
		In this paper, we propose a method for identifyingfont scripts of Myanmar Language. Because of theunavailability of nationwide standardized encodingscheme in Myanmar font scripts, knowledge writtenin Myanmar language are scattered across internetpages. Font scripts Identifier are essential to mergethose scattered knowledge into one for NLPapplication such as text categorization, informationretrieval and text summarization. Our proposedmethod use N-gram based text categorization. Apiece of text for 11 font scripts is taken for training.TF-IDF (Term Frequency-Inverse DocumentFrequency) weights of character N-grams for eachfont script are computed and stored as a profile forthat particular font script. When a new text documentis given to testify, TF-IDF weight is computed forthat font script and cosine similarity is measuredbetween the test and trained profiles. The highestsimilarity scored of the font script is taken as aresult. 100% accuracy is obtained for testing of11different font scripts by applying TF-IDFapproach. Therefore, this method works well forMyanmar font script identification.
Keywords
		Font, Font Script, Language Identification, Font Script Identification, N-gram, Text Categorization, TF-IDF Weights
Identifier		http://onlineresource.ucsy.edu.mm/handle/123456789/865
Journal articles
		Fifth Local Conference on Parallel and Soft Computing
Conference papers
Books/reports/chapters
Thesis/dissertations

Back

0

views

downloads

See details

	Views	Downloads

Versions

Ver.1

2020-09-01 15:35:21.161509

Show All versions

Share

Export

OAI-PMH

DublinCore

Other Formats

JSON

Index Link

Index Tree

Item

Font Script Identification Based on N-gram Text Categorization

Versions

Share

Export