Combination of Multiple Acoustic Models with Multi-scale Features for Myanmar Speech Recognition

Thandar Soe; Su Su Maung; Nyein Nyein Oo

Index Tree

RootNode
- Co-operative College, Mandalay
- Cooperative College, Phaunggyi
- Co-operative University, Sagaing
- Co-operative University, Thanlyin
- Dagon University
- Kyaukse University
- Laquarware Technological college
- Mandalay Technological University
- Mandalay University of Distance Education
- Mandalay University of Foreign Languages
- Maubin University
- Mawlamyine University
- Meiktila University
- Mohnyin University
- Myanmar Institute of Information Technology
- Myanmar Maritime University
- National Management Degree College
- Naypyitaw State Academy
- Pathein University
- Sagaing University
- Sagaing University of Education
- Taunggyi University
- Technological University, Hmawbi
- Technological University (Kyaukse)
- Technological University Mandalay
- University of Computer Studies, Mandalay
- University of Computer Studies Maubin
- University of Computer Studies, Meikhtila
- University of Computer Studies Pathein
- University of Computer Studies, Taungoo
- University of Computer Studies, Yangon
- University of Dental Medicine Mandalay
- University of Dental Medicine, Yangon
- University of Information Technology
- University of Mandalay
- University of Medicine 1
- University of Medicine 2
- University of Medicine Mandalay
- University of Myitkyina
- University of Public Health, Yangon
- University of Veterinary Science
- University of Yangon
- West Yangon University
- Yadanabon University
- Yangon Technological University
- Yangon University of Distance Education
- Yangon University of Economics
- Yangon University of Education
- Yangon University of Foreign Languages
- Yezin Agricultural University
- New Index

Item

{"_buckets": {"deposit": "82053fca-f520-4966-998e-909f8daa802a"}, "_deposit": {"id": "3097", "owners": [], "pid": {"revision_id": 0, "type": "recid", "value": "3097"}, "status": "published"}, "_oai": {"id": "oai:meral.edu.mm:recid/3097", "sets": ["1596119372420", "user-ytu"]}, "communities": ["ytu"], "item_1583103067471": {"attribute_name": "Title", "attribute_value_mlt": [{"subitem_1551255647225": "Combination of Multiple Acoustic Models with Multi-scale Features for Myanmar Speech Recognition", "subitem_1551255648112": "en"}]}, "item_1583103085720": {"attribute_name": "Description", "attribute_value_mlt": [{"interim": "\u003cp\u003eWe proposed an approach to build a robust automatic speech recognizer using deep convolutional neural networks (CNNs). Deep CNNs have achieved a great success in acoustic modelling for automatic speech recognition due to its ability of reducing spectral variations and modelling spectral correlations in the input features. In most of the acoustic modelling using CNN, a fixed windowed feature patch corresponding to a target label (e.g., senone or phone) was used as input to the CNN. Considering different target labels may correspond to different time scales, multiple acoustic models were trained with different acoustic feature scales. Due to auxiliary information learned from different temporal scales could help in classification, multi-CNN acoustic models were combined based on a Recognizer Output Voting Error Reduction (ROVER) algorithm for final speech recognition experiments. The experiments were conducted on a Myanmar large vocabulary continuous speech recognition (LVCSR) task. Our results showed that integration of temporal multi-scale features in model training achieved a 4.32% relative word error rate (WER) reduction over the best individual system on one temporal scale feature.\u003c/p\u003e"}]}, "item_1583103108160": {"attribute_name": "Keywords", "attribute_value_mlt": [{"interim": "acoustic modelling"}, {"interim": "deep convolutional neural networks"}, {"interim": "multi-scale features"}, {"interim": "Myanmar speech recognition"}, {"interim": "ROVER combination"}]}, "item_1583103120197": {"attribute_name": "Files", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_access", "date": [{"dateType": "Available", "dateValue": "2019-06-27"}], "displaytype": "preview", "download_preview_message": "", "file_order": 0, "filename": "Combination of Multiple Acoustic Models with Multi-scale Features for Myanmar Speech Recognition.pdf", "filesize": [{"value": "334 Kb"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "mimetype": "application/pdf", "size": 334000.0, "url": {"url": "https://meral.edu.mm/record/3097/files/Combination of Multiple Acoustic Models with Multi-scale Features for Myanmar Speech Recognition.pdf"}, "version_id": "d8e68a24-e5a8-4825-a570-7c63f43d9cee"}]}, "item_1583103131163": {"attribute_name": "Journal articles", "attribute_value_mlt": [{"subitem_issue": "No. 1", "subitem_journal_title": "International Journal of Computer", "subitem_pages": "pg 112-121", "subitem_volume": "Vol. 28"}]}, "item_1583103147082": {"attribute_name": "Conference papers", "attribute_value_mlt": [{"subitem_acronym": "", "subitem_c_date": "", "subitem_conference_title": "", "subitem_part": "", "subitem_place": "", "subitem_session": "", "subitem_website": ""}]}, "item_1583103211336": {"attribute_name": "Books/reports/chapters", "attribute_value_mlt": [{"subitem_book_title": "", "subitem_isbn": "", "subitem_pages": "", "subitem_place": "", "subitem_publisher": ""}]}, "item_1583103233624": {"attribute_name": "Thesis/dissertations", "attribute_value_mlt": [{"subitem_awarding_university": "", "subitem_supervisor(s)": [{"subitem_supervisor": ""}]}]}, "item_1583105942107": {"attribute_name": "Authors", "attribute_value_mlt": [{"subitem_authors": [{"subitem_authors_fullname": "Thandar Soe"}, {"subitem_authors_fullname": "Su Su Maung"}, {"subitem_authors_fullname": "Nyein Nyein Oo"}]}]}, "item_1583108359239": {"attribute_name": "Upload type", "attribute_value_mlt": [{"interim": "Publication"}]}, "item_1583108428133": {"attribute_name": "Publication type", "attribute_value_mlt": [{"interim": "Journal article"}]}, "item_1583159729339": {"attribute_name": "Publication date", "attribute_value": "2018-10-25"}, "item_1583159847033": {"attribute_name": "Identifier", "attribute_value": "10.5281/zenodo.3068568"}, "item_title": "Combination of Multiple Acoustic Models with Multi-scale Features for Myanmar Speech Recognition", "item_type_id": "21", "owner": "1", "path": ["1596119372420"], "permalink_uri": "http://hdl.handle.net/20.500.12678/0000003097", "pubdate": {"attribute_name": "Deposited date", "attribute_value": "2019-06-27"}, "publish_date": "2019-06-27", "publish_status": "0", "recid": "3097", "relation": {}, "relation_version_is_last": true, "title": ["Combination of Multiple Acoustic Models with Multi-scale Features for Myanmar Speech Recognition"], "weko_shared_id": -1}

Combination of Multiple Acoustic Models with Multi-scale Features for Myanmar Speech Recognition

http://hdl.handle.net/20.500.12678/0000003097

Preview

Name / File	License	Actions
Combination of Multiple Acoustic Models with Multi-scale Features for Myanmar Speech Recognition.pdf (334 Kb)

Publication type
		Journal article
Upload type
		Publication
Title
	Title	Combination of Multiple Acoustic Models with Multi-scale Features for Myanmar Speech Recognition
	Language	en
Publication date		2018-10-25
Authors
		Thandar Soe
		Su Su Maung
		Nyein Nyein Oo
Description
		<p>We proposed an approach to build a robust automatic speech recognizer using deep convolutional neural networks (CNNs). Deep CNNs have achieved a great success in acoustic modelling for automatic speech recognition due to its ability of reducing spectral variations and modelling spectral correlations in the input features. In most of the acoustic modelling using CNN, a fixed windowed feature patch corresponding to a target label (e.g., senone or phone) was used as input to the CNN. Considering different target labels may correspond to different time scales, multiple acoustic models were trained with different acoustic feature scales. Due to auxiliary information learned from different temporal scales could help in classification, multi-CNN acoustic models were combined based on a Recognizer Output Voting Error Reduction (ROVER) algorithm for final speech recognition experiments. The experiments were conducted on a Myanmar large vocabulary continuous speech recognition (LVCSR) task. Our results showed that integration of temporal multi-scale features in model training achieved a 4.32% relative word error rate (WER) reduction over the best individual system on one temporal scale feature.</p>
Keywords
		acoustic modelling, deep convolutional neural networks, multi-scale features, Myanmar speech recognition, ROVER combination
Identifier		10.5281/zenodo.3068568
Journal articles
		No. 1
		International Journal of Computer
		pg 112-121
		Vol. 28
Conference papers
Books/reports/chapters
Thesis/dissertations

Back

0

views

downloads

See details

	Views	Downloads

Versions

Ver.1

2020-08-30 13:53:48.030471

Show All versions

Share

Export

OAI-PMH

DublinCore

Other Formats

JSON

Index Link

Index Tree

Item

Combination of Multiple Acoustic Models with Multi-scale Features for Myanmar Speech Recognition

Versions

Share

Export