Demo: Greek Text to Speech

Fig. 1. Text-to-Speech synthesis system diagram

The text-to-speech (TTS) system implemented in this TTS demo is based on the Festival Speech Synthesis framework (Taylor et al., 1998). This TTS is based on unit selection corpus based speech synthesis technique which is based on the use of large speech databases. During run time, a cost function is used to select segments from the database that vary in length, F0 and other parameters such as sentence type, or position in the sentence, and concatenate them in order to synthesize speech. The unit selection corpus based voice for the Greek language was created using the Vergina speech database (Lazaridis et al., 2010). Vergina speech database consists of 3000 sentences uttered by a female Greek native speaker. Specifically, a text corpus of approximately 5 million words, collected from newspaper articles, periodicals, and paragraphs of literature, was processed in order to select the utterances-sentences needed for producing the speech database and to achieve a reasonable phonetic coverage. The database, recorded in audio studio, consists of approximately 3,000 phonetically balanced Greek utterances corresponding to approximately four hours of speech.

For more information please contact: Nikos Fakotakis or Alexandros Lazaridis
Web based demo created by: Charalampos Tsimpouris

[1] Taylor, P., Black, A., Caley, R., 1998. The architecture of the Festival speech synthesis system. In: Proceedings of the The Third ESCA Workshop in Speech Synthesis, pp. 147–151.
[2] Alexandros Lazaridis, Theodoros Kostoulas, Todor Ganchev, Iosif Mporas, Nikos Fakotakis, 2010. "VERGINA: A modern Greek speech database for speech synthesis", LREC’2010, Malta, May 19-21, 2010.