文件名称:DNN-HMM Based Multilingual Recognizer of Telephone Speech
文件大小:2.01MB
文件格式:PDF
更新时间:2022-02-11 07:58:46
asr dnn gmm 语音识别
This thesis deals with the multilingual acoustic modeling problem based on the shared global phones inventory for five East Eurpoean languages: Czech, Russian, Hungarian, Slovak and Polish which are available within SpeechDat-E, i.e. the set of telephone speech databases. Because the SAMPA with unnormalized convention is used to represent the phonetic content of the particular languages and different symbols are in several cases representing the same phone, the mapping to the general X-SAMPA phonetic alphabet was proposed in the first step. The impact of a multilingual acoustic modeling was analyzed on the basis of a continuous speech recognition. The analysis of the acoustic modeling in the LVCSR task was performed for the GMM-HMM system and for the DNN-GMM approach. The experiments were performed for the LVCSR with the language specific acoustic model same as for the multilingual system. The particular recognizers were implemented via the Kaldi toolkit. One of this thesis goals is to provide a tutorial-style description of the Kaldi usage and create the recipe for the SpeechDat databases. Depending on the language, the best obtained accuracy of HMM recognizers was 18%-28%WER. DNN-HMM improved the results about 4%WER on average. The results for the multilingual HMM system reached the values from 25%-37%WER.The DNN approached had significant impact on the speech recognition accuracy for the multilingual system as well and it reduced theWER about 9% on average.