{"created":"2020-09-01T15:11:27.376124+00:00","id":4611,"links":{},"metadata":{"_buckets":{"deposit":"954a46bf-f4cc-43d0-a255-4aafb84215de"},"_deposit":{"id":"4611","owners":[],"pid":{"revision_id":0,"type":"recid","value":"4611"},"status":"published"},"_oai":{"id":"oai:meral.edu.mm:recid/4611","sets":["1582963302567:1597824322519"]},"communities":["ucsy"],"item_1583103067471":{"attribute_name":"Title","attribute_value_mlt":[{"subitem_1551255647225":"Enhancing Myanmar Text-to-Speech System by Using Linguistic Information on LSTM-RNN Based Speech Synthesis Model and Text Normalization","subitem_1551255648112":"en"}]},"item_1583103085720":{"attribute_name":"Description","attribute_value_mlt":[{"interim":"This thesis focuses on enhancing Myanmar Text-to-Speech (TTS) system togenerate more natural synthetic speech for a given input text. Typical TTS systemshave two main components, text analysis (front-end), and speech waveformgeneration (back-end). Both front-end and back-end parts are important for theintelligibility and naturalness of the TTS system. Therefore, this thesis is emphasizedon both text analysis part and acoustic modelling part in Statistical Parametric SpeechSynthesis (SPSS) system.Text analysis part consists of a number of natural language processing (NLP)steps and text normalization is the first and crucial phase among them. Myanmar textcontains many non-standard words (NSWs) with numbers. Therefore, Myanmarnumber normalization designed for Myanmar TTS system is implemented by usingWeighted Finite-State Transducers (WFSTs). For grapheme-to-phoneme (G2P)conversion in text analysis part, the first large Myanmar pronunciation dictionary isbuilt, and the quality of that dictionary is confirmed by applying machine learningtechniques such as sequence to sequence modelling. With the purpose of extractingcontextual linguistic features which can promote the quality of the synthesized speechof Myanmar TTS system, phoneme features and a large Myanmar pronunciationdictionary with syllable information are prepared on a general speech synthesisarchitecture, Festival. After that, a proposed Myanmar question set is applied inextracting linguistic features which will be used in neural network based speechsynthesis. Finally, word segmentation, WFST based number normalization, G2Pconversion, and contextual labels extraction modules are integrated into text analysispart of Myanmar TTS system.The accuracy of acoustic model in SPSS is very important to achieve goodquality synthetic speech. In this work, Hidden Markov Model based Myanmar speechsynthesis is conducted with many contextual labels extracted from text analysis partand used as the baseline system. The state-of-the-art modelling techniques such asDeep Neural Network (DNN) and Long Short-Term Memory Recurrent NeuralNetwork (LSTM-RNN) have been applied in acoustic modelling of Myanmar speechsynthesis to promote the naturalness of synthesized speech. The effectiveness ofcontextual linguistic features and tone information are explored in LSTM-RNN basedivMyanmar speech synthesis using the proposed Myanmar question set. Furthermore,the effect of applying word embedding and/or Part-of-Speech (POS) features as theadditional input features in acoustic modelling of DNN and LSTM-RNN basedsystems are investigated in this work. The effect of word vector features can be seenclearly in DNN based system in both objective and subjective evaluations. However,in LSTM-RNN based systems, it can be observed that applying word embeddingfeatures can only give little improvement in subjective results and it cannot lead toany improvement in objective results. Therefore, it can be concluded that contextuallinguistic features extracted from our text analysis part and the proposed question setare good enough for acoustic modelling of LSTM-RNN based Myanmar TTS systemto generate the more natural synthesized speech for Myanmar language. According tothe objective and subjective results, the hybrid system of DNN and LSTM-RNN (i.e.,four feedforward hidden layers followed by two LSTM-RNN layers) is the mostsuitable network architecture for Myanmar speech synthesis."}]},"item_1583103108160":{"attribute_name":"Keywords","attribute_value":[]},"item_1583103120197":{"attribute_name":"Files","attribute_type":"file","attribute_value_mlt":[{"accessrole":"open_access","date":[{"dateType":"Available","dateValue":"2020-06-06"}],"displaytype":"preview","filename":"Enhancing Myanmar Text-To-Speech System (AyeMyaHlaing).pdf","filesize":[{"value":"5784 Kb"}],"format":"application/pdf","licensetype":"license_note","mimetype":"application/pdf","url":{"url":"https://meral.edu.mm/record/4611/files/Enhancing Myanmar Text-To-Speech System (AyeMyaHlaing).pdf"},"version_id":"462f76d8-7814-417b-b39f-e6f4a96d0c38"}]},"item_1583103131163":{"attribute_name":"Journal articles","attribute_value_mlt":[{"subitem_issue":"","subitem_journal_title":"University of Computer Studies, Yangon","subitem_pages":"","subitem_volume":""}]},"item_1583103147082":{"attribute_name":"Conference papers","attribute_value_mlt":[{"subitem_acronym":"","subitem_c_date":"","subitem_conference_title":"","subitem_part":"","subitem_place":"","subitem_session":"","subitem_website":""}]},"item_1583103211336":{"attribute_name":"Books/reports/chapters","attribute_value_mlt":[{"subitem_book_title":"","subitem_isbn":"","subitem_pages":"","subitem_place":"","subitem_publisher":""}]},"item_1583103233624":{"attribute_name":"Thesis/dissertations","attribute_value_mlt":[{"subitem_awarding_university":"","subitem_supervisor(s)":[{"subitem_supervisor":""}]}]},"item_1583105942107":{"attribute_name":"Authors","attribute_value_mlt":[{"subitem_authors":[{"subitem_authors_fullname":"Hlaing, Aye Mya"}]}]},"item_1583108359239":{"attribute_name":"Upload type","attribute_value_mlt":[{"interim":"Publication"}]},"item_1583108428133":{"attribute_name":"Publication type","attribute_value_mlt":[{"interim":"Book"}]},"item_1583159729339":{"attribute_name":"Publication date","attribute_value":"2020-06"},"item_1583159847033":{"attribute_name":"Identifier","attribute_value":"http://onlineresource.ucsy.edu.mm/handle/123456789/2529"},"item_title":"Enhancing Myanmar Text-to-Speech System by Using Linguistic Information on LSTM-RNN Based Speech Synthesis Model and Text Normalization","item_type_id":"21","owner":"1","path":["1597824322519"],"publish_date":"2020-06-06","publish_status":"0","recid":"4611","relation_version_is_last":true,"title":["Enhancing Myanmar Text-to-Speech System by Using Linguistic Information on LSTM-RNN Based Speech Synthesis Model and Text Normalization"],"weko_creator_id":"1","weko_shared_id":-1},"updated":"2021-12-13T03:31:27.414488+00:00"}