Continuous Speech Recognition System Based on Deep Convolutional  Neural Network for Myanmar

Yin Win Chit; Soe Soe Khaing; Yi Yi Myint

MERAL Myanmar Education Research and Learning Portal

lat lon distance

[[sub_check.contents]]　

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

Index Tree

Item

{"_buckets": {"deposit": "2aa644bb-a79c-4303-be73-926765eb99a7"}, "_deposit": {"created_by": 73, "id": "7793", "owner": "73", "owners": [73], "owners_ext": {"displayname": "", "username": ""}, "pid": {"revision_id": 0, "type": "depid", "value": "7793"}, "status": "published"}, "_oai": {"id": "oai:meral.edu.mm:recid/00007793", "sets": ["user-miit"]}, "communities": ["miit"], "item_1583103067471": {"attribute_name": "Title", "attribute_value_mlt": [{"subitem_1551255647225": "Continuous Speech Recognition System Based on Deep Convolutional  Neural Network for Myanmar", "subitem_1551255648112": "en"}]}, "item_1583103085720": {"attribute_name": "Description", "attribute_value_mlt": [{"interim": "Automatic  Speech  Recognition  (ASR)  system, that translates the speech signal into text words, is still a challenge  in  the  continuous  speech  signal.  Continuous speech  recognition  systems  develop  with  four  separated \nsegmentation \nspeech    signal,    feature \nof \nthe \nsteps: \nextraction, classification and recognizing the words. These steps can be modeled with the various methods. Among them,  the  combination  model  of  the  dynamic  threshold based segmentation, Mel-Frequency Cepstral Coefficient \n(MFCC)     feature     extraction     method \nDeep \nand \nConvolutional Neural Network (DCNN) is proposed in this paper.  Especially,  DCNN-AlexNet  has  been  applied  in image  processing  because  it  can  perform  as  a  highly accurate, effective and powerful classifier. In the training and classification step of this system, the advantages of DCNN in image processing are applied using the MFCC feature  images.  The  main  purpose of  this  system  is  to transform the  MFCC  features  of  the  speech  signal  to MFCC features images with various frame size for three layers of input images of DCNN. The three layers 32*32*3 images are used for the input images of DCNN-AlexNet to \nrecognition \nof \nthe \nthe \nThe \nsystem."}]}, "item_1583103108160": {"attribute_name": "Keywords", "attribute_value_mlt": [{"interim": "Automatic Speech Recognition, Mel-Frequency Cepstral Coefficient, Deep Convolutional Neural Network, Word Error Rate"}]}, "item_1583103120197": {"attribute_name": "Files", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_access", "date": [{"dateType": "Available", "dateValue": "2021-01-26"}], "displaytype": "preview", "download_preview_message": "", "file_order": 0, "filename": "ICSTSD_2018  revised paper.pdf", "filesize": [{"value": "510 KB"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_3", "mimetype": "application/pdf", "size": 510000.0, "url": {"url": "https://meral.edu.mm/record/7793/files/ICSTSD_2018  revised paper.pdf"}, "version_id": "405bf328-4191-443c-8fa8-ca8d5e49695d"}]}, "item_1583103147082": {"attribute_name": "Conference papers", "attribute_value_mlt": [{"subitem_acronym": "ICSTSD", "subitem_c_date": "2018-05-14", "subitem_conference_title": "Proc. of  1st Intl. Conf. on Science and Technology for Sustainable Development (ICSTSD 2018)", "subitem_place": "UCSY, Myanmar", "subitem_website": "www.ucsy.edu.mm"}]}, "item_1583105942107": {"attribute_name": "Authors", "attribute_value_mlt": [{"subitem_authors": [{"subitem_authors_fullname": "Yin Win Chit"}, {"subitem_authors_fullname": "Soe Soe Khaing"}, {"subitem_authors_fullname": "Yi Yi Myint"}]}]}, "item_1583108359239": {"attribute_name": "Upload type", "attribute_value_mlt": [{"interim": "Publication"}]}, "item_1583108428133": {"attribute_name": "Publication type", "attribute_value_mlt": [{"interim": "Conference paper"}]}, "item_1583159729339": {"attribute_name": "Publication date", "attribute_value": "2018-05-14"}, "item_title": "Continuous Speech Recognition System Based on Deep Convolutional  Neural Network for Myanmar", "item_type_id": "21", "owner": "73", "path": ["1582963674932", "1597396989070"], "permalink_uri": "http://hdl.handle.net/20.500.12678/0000007793", "pubdate": {"attribute_name": "Deposited date", "attribute_value": "2018-05-14"}, "publish_date": "2018-05-14", "publish_status": "0", "recid": "7793", "relation": {}, "relation_version_is_last": true, "title": ["Continuous Speech Recognition System Based on Deep Convolutional  Neural Network for Myanmar"], "weko_shared_id": -1}

Continuous Speech Recognition System Based on Deep Convolutional Neural Network for Myanmar

http://hdl.handle.net/20.500.12678/0000007793

Preview

Name / File	License	Actions
ICSTSD_2018 revised paper.pdf (510 KB)

Publication type
		Conference paper
Upload type
		Publication
Title
	Title	Continuous Speech Recognition System Based on Deep Convolutional Neural Network for Myanmar
	Language	en
Publication date		2018-05-14
Authors
		Yin Win Chit
		Soe Soe Khaing
		Yi Yi Myint
Description
		Automatic Speech Recognition (ASR) system, that translates the speech signal into text words, is still a challenge in the continuous speech signal. Continuous speech recognition systems develop with four separated segmentation speech signal, feature of the steps: extraction, classification and recognizing the words. These steps can be modeled with the various methods. Among them, the combination model of the dynamic threshold based segmentation, Mel-Frequency Cepstral Coefficient (MFCC) feature extraction method Deep and Convolutional Neural Network (DCNN) is proposed in this paper. Especially, DCNN-AlexNet has been applied in image processing because it can perform as a highly accurate, effective and powerful classifier. In the training and classification step of this system, the advantages of DCNN in image processing are applied using the MFCC feature images. The main purpose of this system is to transform the MFCC features of the speech signal to MFCC features images with various frame size for three layers of input images of DCNN. The three layers 32323 images are used for the input images of DCNN-AlexNet to recognition of the the The system.
Keywords
		Automatic Speech Recognition, Mel-Frequency Cepstral Coefficient, Deep Convolutional Neural Network, Word Error Rate
Conference papers
		ICSTSD
		2018-05-14
		Proc. of 1st Intl. Conf. on Science and Technology for Sustainable Development (ICSTSD 2018)
		UCSY, Myanmar
		www.ucsy.edu.mm

Back

views

downloads

See details

	Views	Downloads

Versions

Ver.1

2021-01-26 07:26:59.538970

Show All versions

Export

OAI-PMH

DublinCore

Other Formats

JSON

Index Link

Index Tree

Item

Continuous Speech Recognition System Based on Deep Convolutional Neural Network for Myanmar

Versions

Share

Export