Important Events

User login


Syndicate content

Contact details

Contact WCL

Tel: +30 2610 996496
+30 2610 996480
Fax: +30 2610 997336
+30 2610 991855
Email: raniadou [AT]
Electrical and Computer Enginnering Department
University of Patras

Demo: Detection of Atypical Vocalic Reaction

In order to deal with the variety of scenarios which were recorded in the PROMETHEUS database we have adapted the system explained in [1]. This system was adapted towards detecting every type of atypical vocalic reaction that exists in the PROMETHEUS database (panic, scream, anger etc.) and the available sound events. In contrast to [1] here, we do not consider sound events such as explosion and gunshots, since they do not occur in the PROMETHEUS database. In brief, the sound recognition system has a hierarchical structure comprised of two stages:

  1. at the first stage the incoming sound is classified as vocalic or not vocalic and
  2. and in the second stage in the case a vocalic event is further processed to judge on its abnormality.

Figure 1: Block diagram of the acoustic surveillance system.

Figure 1: Block diagram of the acoustic surveillance system.

Specifically, we trained diagonal GMMs for representing the involved audio categories. The following audio feature sets were used:

  1. the first thirteen coefficients of the MFCC vector including the first one appended by their first derivatives,
  2. TEO autocorrelation envelope area, pitch, pitch derivative and harmonicity to noise ratio.

Figure 2: DET curves for all PROMETHEUS scenarios and the respective equal error rates. The target class is abnormal vocalic eventsThe first group was used to differentiate between vocalic and non-vocalic sound events. The second one was combined with the first for classifying normal speech and atypical human expressions since it has the ability to capture the variations that intonation exhibits when speech signals are produced under abnormal circumstances. Details about the training of the system as well as results under different SNR conditions can be found in [1]. In brief, fifty percent of the data was utilized for training the corresponding statistical models which the rest was employed for testing. The division train-test datasets was done in a random way. The performance of the first stage (vocalic/non-vocalic discrimination) was 100% for all scenarios. The performance of the second stage is presented as DET curves in Figure 2. The respective equal error rates are 5.38%, 0.25% and 0.58%. As we can see in the figure both the miss detection and false alarm probabilities have low values, which shows the good discriminative properties of the selected feature sets. We observe that the results on the outdoors recordings are better than the ones which belong to the indoors data. This might be due to the larger number of atypical events that exist in this scenario, including additional sound events such as fracture of material, dropping of objects, etc. Τhe described database is useful for creating probabilistic models which represent typical and hazardous situations under real-world conditions.

(JavaApplet in action)

For more information please contact: Nikos Fakotakis or Stavros Ntalampiras
Web based demo created by: Charalampos Tsimpouris

[1] Stavros Ntalampiras, Ilyas Potamitis, Nikos Fakotakis, “An Adaptive Framework for Acoustic Monitoring of Potential Hazards”, EURASIP Journal on Audio, Speech, and Music Processing Volume 2009 (2009), Article ID 594103, doi:10.1155/2009/594103
[2] Stavros Ntalampiras, Todor Ganchev, Ilyas Potamitis and Nikos Fakotakis, “Heterogeneous Sensor Database in Support for Human Behavior Analysis in Unrestricted Environments: The Audio Part”, LREC 2010, Malta, May 17-23, 2010.