triadainabox.blogg.se

The succession of articulated durations
The succession of articulated durations











the succession of articulated durations

Therefore, we analyzed the MRI tracings of the consonants articulated in the contexts of the vowels /a:/, /i:/, and /u:/. The dynamic MRI corpus was used to replicate the coarticulation of the speaker. The resulting acoustic match of the vowels in terms of formant differences was examined and optimized. For each phoneme, the vocal tract parameters were adjusted manually for a close visual match of the MRI-tracings and the model-derived outlines. The volumetric MRI corpus was used for the adaptation of vocalic and (neutral) consonantal target shapes. We combined two MRI corpora of the speaker: A corpus of volumetric images of sustained phonemes and a corpus with midsagittal image sequences of dynamic utterances. We present the adaptation of the anatomy and articulation of a 3D vocal tract model to a new speaker using magnetic resonance imaging (MRI). Experimental results on the test set on MNGU0 database show: (1) the velocity and acceleration of articulatory movements are quite effective on articulatory-to-F0 prediction (2) acoustic feature evaluated from articulatory feature with neural networks makes a little better performance than the fusion of it and articulatory feature on articulatory-to-F0 prediction (3) LSTM models can achieve better effect in articulatory-to-F0 prediction than DNNs (4) Only-voiced model training method is proved to outperform the conventional method. In this paper, only F0 values at voiced frames are adopted for training. Besides, the conventional method for articulatory-to-F0 mapping for voiced frames uses the F0 values after interpolation to train the model.

the succession of articulated durations

This paper explores several types of articulatory features to confirm the most suitable one for F0 prediction using deep neural networks (DNNs) and long short-term memory (LSTM). In this paper, articulatory-to-F0 prediction contains two parts, one part is articulatory-to-voiced/unvoiced flag classification and the other one is articulatory-to-F0 mapping for voiced frames.













The succession of articulated durations