Special Issue Article
Estimation Of Phrase Boundaries For Tamil Speech Synthesizer
Given any arbitrary text in a language, a textto- speech (TTS) system is expected to produce high quality speech. One of the major language-specific information, in addition to list of phonemes is, the phrase boundaries in a given sentence, in the form of "comma" and other punctuations. Wherever a comma is present in the text, during parsing, the synthesizer will introduce a silence to represent it. This will improve the quality, and even in some cases, the proper meaning can be conveyed. However, for Tamil, in the written text, the phrase boundaries are not explicitly present, thus the quality of the HMM-based synthesizer is found to be poor, in the sense that, the individual words in the sentence sound very good, but as a sentence, it does not sound natural. For the language Tamil, estimating the phrase boundaries, from a given sentence, is still a research issue. A system without phrase boundary is built as a baseline system. Without any analysis carried out on given text, silence is introduced arbitrarily after each word, every two words, and every three words. Even though, there is an improvement in the naturalness in the synthesized speech, since phrase boundaries, in terms of pauses, are introduced arbitrarily, in many synthetic sentences the quality is annoyingly low. An analysis is carried out on word terminal syllables occuring at the phrase boundaries and the 50 most frequently occuring word terminal syllables are considered. Based on this analysis another system is built which gives phrase boundaries after the words that terminate in these syllables. Significant improvement is achieved when phrase boundaries are predicted using terminal syllables, however, certain phrase boundaries are not predicted due to absence of terminal syllables. So a final system is developed, where initially phrase boundaries are predicted based on the word terminal syllable and then if the number of words in each phrase exceeds a threshold, a new phrase boundary is introduced at the midpoint of each phrase. This system produces high quality speech with a mean opinion score (MOS) of 4.23.