Log in
Language:

MERAL Myanmar Education Research and Learning Portal

  • Top
  • Universities
  • Ranking
To
lat lon distance
To

Field does not validate



Index Link

Index Tree

Please input email address.

WEKO

One fine body…

WEKO

One fine body…

Item

{"_buckets": {"deposit": "82053fca-f520-4966-998e-909f8daa802a"}, "_deposit": {"id": "3097", "owners": [], "pid": {"revision_id": 0, "type": "recid", "value": "3097"}, "status": "published"}, "_oai": {"id": "oai:meral.edu.mm:recid/3097", "sets": ["1596119372420", "user-ytu"]}, "communities": ["ytu"], "item_1583103067471": {"attribute_name": "Title", "attribute_value_mlt": [{"subitem_1551255647225": "Combination of Multiple Acoustic Models with Multi-scale Features for Myanmar Speech Recognition", "subitem_1551255648112": "en"}]}, "item_1583103085720": {"attribute_name": "Description", "attribute_value_mlt": [{"interim": "\u003cp\u003eWe proposed an approach to build a robust automatic speech recognizer using deep convolutional neural networks (CNNs). Deep CNNs have achieved a great success in acoustic modelling for automatic speech recognition due to its ability of reducing spectral variations and modelling spectral correlations in the input features. In most of the acoustic modelling using CNN, a fixed windowed feature patch corresponding to a target label (e.g., senone or phone) was used as input to the CNN. Considering different target labels may correspond to different time scales, multiple acoustic models were trained with different acoustic feature scales. Due to auxiliary information learned from different temporal scales could help in classification, multi-CNN acoustic models were combined based on a Recognizer Output Voting Error Reduction (ROVER) algorithm for final speech recognition experiments. The experiments were conducted on a Myanmar large vocabulary continuous speech recognition (LVCSR) task. Our results showed that integration of temporal multi-scale features in model training achieved a 4.32% relative word error rate (WER) reduction over the best individual system on one temporal scale feature.\u003c/p\u003e"}]}, "item_1583103108160": {"attribute_name": "Keywords", "attribute_value_mlt": [{"interim": "acoustic modelling"}, {"interim": "deep convolutional neural networks"}, {"interim": "multi-scale features"}, {"interim": "Myanmar speech recognition"}, {"interim": "ROVER combination"}]}, "item_1583103120197": {"attribute_name": "Files", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_access", "date": [{"dateType": "Available", "dateValue": "2019-06-27"}], "displaytype": "preview", "download_preview_message": "", "file_order": 0, "filename": "Combination of Multiple Acoustic Models with Multi-scale Features for Myanmar Speech Recognition.pdf", "filesize": [{"value": "334 Kb"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "mimetype": "application/pdf", "size": 334000.0, "url": {"url": "https://meral.edu.mm/record/3097/files/Combination of Multiple Acoustic Models with Multi-scale Features for Myanmar Speech Recognition.pdf"}, "version_id": "d8e68a24-e5a8-4825-a570-7c63f43d9cee"}]}, "item_1583103131163": {"attribute_name": "Journal articles", "attribute_value_mlt": [{"subitem_issue": "No. 1", "subitem_journal_title": "International Journal of Computer", "subitem_pages": "pg 112-121", "subitem_volume": "Vol. 28"}]}, "item_1583103147082": {"attribute_name": "Conference papers", "attribute_value_mlt": [{"subitem_acronym": "", "subitem_c_date": "", "subitem_conference_title": "", "subitem_part": "", "subitem_place": "", "subitem_session": "", "subitem_website": ""}]}, "item_1583103211336": {"attribute_name": "Books/reports/chapters", "attribute_value_mlt": [{"subitem_book_title": "", "subitem_isbn": "", "subitem_pages": "", "subitem_place": "", "subitem_publisher": ""}]}, "item_1583103233624": {"attribute_name": "Thesis/dissertations", "attribute_value_mlt": [{"subitem_awarding_university": "", "subitem_supervisor(s)": [{"subitem_supervisor": ""}]}]}, "item_1583105942107": {"attribute_name": "Authors", "attribute_value_mlt": [{"subitem_authors": [{"subitem_authors_fullname": "Thandar Soe"}, {"subitem_authors_fullname": "Su Su Maung"}, {"subitem_authors_fullname": "Nyein Nyein Oo"}]}]}, "item_1583108359239": {"attribute_name": "Upload type", "attribute_value_mlt": [{"interim": "Publication"}]}, "item_1583108428133": {"attribute_name": "Publication type", "attribute_value_mlt": [{"interim": "Journal article"}]}, "item_1583159729339": {"attribute_name": "Publication date", "attribute_value": "2018-10-25"}, "item_1583159847033": {"attribute_name": "Identifier", "attribute_value": "10.5281/zenodo.3068568"}, "item_title": "Combination of Multiple Acoustic Models with Multi-scale Features for Myanmar Speech Recognition", "item_type_id": "21", "owner": "1", "path": ["1596119372420"], "permalink_uri": "http://hdl.handle.net/20.500.12678/0000003097", "pubdate": {"attribute_name": "Deposited date", "attribute_value": "2019-06-27"}, "publish_date": "2019-06-27", "publish_status": "0", "recid": "3097", "relation": {}, "relation_version_is_last": true, "title": ["Combination of Multiple Acoustic Models with Multi-scale Features for Myanmar Speech Recognition"], "weko_shared_id": -1}
  1. Yangon Technological University
  2. Department of Computer Engineering and Information Technology

Combination of Multiple Acoustic Models with Multi-scale Features for Myanmar Speech Recognition

http://hdl.handle.net/20.500.12678/0000003097
http://hdl.handle.net/20.500.12678/0000003097
792bd631-4190-4524-83f8-d5884abad2f1
82053fca-f520-4966-998e-909f8daa802a
None
Preview
Name / File License Actions
Combination Combination of Multiple Acoustic Models with Multi-scale Features for Myanmar Speech Recognition.pdf (334 Kb)
Publication type
Journal article
Upload type
Publication
Title
Title Combination of Multiple Acoustic Models with Multi-scale Features for Myanmar Speech Recognition
Language en
Publication date 2018-10-25
Authors
Thandar Soe
Su Su Maung
Nyein Nyein Oo
Description
<p>We proposed an approach to build a robust automatic speech recognizer using deep convolutional neural networks (CNNs). Deep CNNs have achieved a great success in acoustic modelling for automatic speech recognition due to its ability of reducing spectral variations and modelling spectral correlations in the input features. In most of the acoustic modelling using CNN, a fixed windowed feature patch corresponding to a target label (e.g., senone or phone) was used as input to the CNN. Considering different target labels may correspond to different time scales, multiple acoustic models were trained with different acoustic feature scales. Due to auxiliary information learned from different temporal scales could help in classification, multi-CNN acoustic models were combined based on a Recognizer Output Voting Error Reduction (ROVER) algorithm for final speech recognition experiments. The experiments were conducted on a Myanmar large vocabulary continuous speech recognition (LVCSR) task. Our results showed that integration of temporal multi-scale features in model training achieved a 4.32% relative word error rate (WER) reduction over the best individual system on one temporal scale feature.</p>
Keywords
acoustic modelling, deep convolutional neural networks, multi-scale features, Myanmar speech recognition, ROVER combination
Identifier 10.5281/zenodo.3068568
Journal articles
No. 1
International Journal of Computer
pg 112-121
Vol. 28
Conference papers
Books/reports/chapters
Thesis/dissertations
Back
0
0
views
downloads
See details
Views Downloads

Versions

Ver.1 2020-08-30 13:53:48.030471
Show All versions

Share

Mendeley Twitter Facebook Print Addthis

Export

OAI-PMH
  • OAI-PMH DublinCore
Other Formats
  • JSON

Confirm


Back to MERAL


Back to MERAL