Audio and Speech Processing
The AI group has a long tradition in the fields of Automatic Speech Recognition, Text-To-Speech synthesis (TTS), speaker verification and identification, as well as, in language modeling as a means to enhance the speech recognition performance.
- Members of the AI group have developed numerous speech processing components/applications, among which are the following:
- Adaptive Framework for Real-Time Acoustic Surveillance of Potential Hazards, based on probabilistic structures (developed within the Prometheus project)
- Speech/Music Discriminator, HMM-based, frequency-domain and wavelet-based features
- Automatic Recognizer of Urban Environmental Sound Events, based on hierarchical structures
- A Speech Annotation Toolbox (developed within the SpeechDat(II) and SpeechDat(Car) projects)
- Recording Tools for Speech Database creation
- A Modern Greek TTS system, based on MBROLA concatenative algorithm
- A Modern Greek TTS system, based on Klutt formant synthesizer
- A Modern Greek TTS system, based on unit selection, corpus-based
- Speaker Verification and Speaker Identification systems, based on Neural Networks.
- Automatic Speech Recognition for Greek, British English and German Languages
- Spoken Language Recognition System, PPRLM-based
- Greek-Cypriot Dialect recognizer, PRLM-based
- Automatic Speech Segmentation Tools, HMM-based
- Speech-based Emotion Detection System, GMM-based
- Real-life Speech-based Affect Recognition System, based on acoustic and linguistic information
- An Environment for Building Interactive Natural Interfaces (in the framework of the GEMINI project)
- A Dialogue System for the Automation of Call Centre Services, automating the collection of data for car insurance companies (in the framework of the European Project ACCeSS).
- A Dialogue System for Telephone-based Services (in the framework of the European Project IDAS)
- A Spoken Dialogue Interaction System for smart-home environment (in the framework of the INSPIRE project)
- The Phone-Call Router for the Department of Electrical and Computer Engineering at the University of Patras
- The Voice Portal for the University of Patras
Natural Language Processing
The AI group has developed natural language tools for Modern Greek covering a wide variety of applications.
In particular, the following tools/components are available:
- A grapheme-to-phoneme (and vice versa) converter for Modern Greek, based on the two-level morphology model.
- A morphological processor for Modern Greek based on the PC-KIMMO formalization, performing morphological analysis and synthesis over a lexicon of 30.000 lemmas.
- A unification-based syntactic analyzer for Modern Greek based on the PC-PATR formalization.
- A sentence and chunk boundaries detector for unrestricted Modern Greek text.
- A stylistic analyzer for unrestricted Modern Greek text that categorizes texts in terms of genre and author.
- A business letter generator for Modern Greek that takes into account stylistic aspects (in the framework of the national project DIALOGOS).
- A semantic parser for the identification of temporal expressions in Modern Greek texts.
- Algorithms for incremental construction of lexicons in Directed Acyclic Word Graphs (DAWG) and algorithms for fast access of these lexicons.
Speech and Language Resources
The AI group created (either on its own or in cooperation with other partners) a number of speech and language resources, among which are the following:
- SpeechDat(II)-FDB-5000-Greek – a speech recognition database with 5000 speakers (within the SpeechDat(II) project)
- SpeechDat(Car)-Greek – a speech recognition database (within the SpeechDat(Car) project)
- PolyCost Speaker Recognition database (within the COST 250 project)
- Orientel Cypriot Greek Speech database (within the Orientel project)
- MoveOn Motorcycle speech and noise database for police information support systems (within the MoveOn project)
- Prosodic database for text-to-speech synthesis for Greek language
- Acted emotional speech database for Greek language
- Greek speech database for corpus-based text-to-speech synthesis
- Real-world Affective Speech corpus (smart-home domain)
- PlayMancer Multimodal Affective corpus – video, speech, bio-signals, (serious game domain), (within the PlayMancer project)
- Prometheus database – A Multimodal Database of Heterogeneous Sensors for Human Behavior Analysis and Interpretation – microphone arrays, video cameras, infrared cameras, 3D cameras, IR movement detection sensors, (within the Prometheus project)
- Various text corpora (with overall size over 50 Mwords)
- ESPRIT 860: Greek newspaper corpus with grammatical analysis of words
- ORTHO: Greek monolingual lexicon, compiled from several printed dictionaries
- COLLINS: Corpus and dictionary
- ONOMASTICA: Lexicon of Greek proper names
- IDAS: Surnames in phonetic transcription
- POLYGLOT: Speech samples, annotated
- LIP READING: 157 AVI files with lip moves during word pronunciation
- Korais lexicon, with over 80000 lemmas
- Morphological analysers
- Syntactic parsers
- Lemmatizers (also language independent ones)
- Grapheme-to-phoneme and phoneme-to-grapheme converters
- A generic platform for semi-automatic generation of multilingual and multimodal interfaces
Past Research Activities
Optical Character Recognition
The AI group has developed tools for the preprocessing of document images and words as well as systems for character recognition. In more detail, the following tools are available:
- A skew estimation system for printed and handwritten documents.
- A shift correction system for printed and handwritten words.
- A handwritten character recognition system for Modern Greek.
Authorship Recognition from text documents
The AI group has developed tools for authorship identification from text documents.