Received date: January 08, 2016 Accepted date: January 18, 2016 Published date: January 25, 2016
Citation: Sabir B, Touri B, Moussetad M (2016) Algorithm of Acoustic Analysis of Communication Disorders within Moroccan Students. Commun Disord Deaf Stud Hearing Aids 4:149. doi: 10.4172/2375-4427.1000149
Copyright: © 2016 Sabir B, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of Communication Disorders, Deaf Studies & Hearing Aids
Objective: Communication disorders negatively affect the academic curriculum for students in higher education. Acoustic analysis is an objective leading tool to describe these disorders; however the amount of the acoustic parameters makes differentiating pathological voices among healthy ones not an easy task. The purpose of the present paper was to present the relevant acoustic parameters that differentiate objectively pathological voices among healthy ones. Methods: Pathological and normal voices samples of /a/, /i/ and /u/ utterances, of 400 students were recorded and analyzed acoustically with PRAAT software, then a feature of acoustic parameters were extracted. A statistical analysis was performed in order to reduce the extracted parameters to main relevant ones in order to build a model that will be the basis for the objective diagnostic. Results: Mean amplitude, jitter local absolute, second bandwidth of the second formant and Noise-to-Harmonic Ratio; are relevant acoustic parameters that characterize pathological voices among healthy ones, for the utterances of vowels /a/, /i/ and /u/ Thresholds of the acoustic parameters of pathological /a/, /i/, and /u/ were calculated. A training model was built and simulated on Matlab, and a comparison between Hidden Markov Model and K-Nearest Neighbors classification methods were done (Hidden Markov Model had a rate of recognition of 95% and K-Nearest Neighbors within the reduced acoustic parameters reached a recognition rate of 97%). Conclusion: Through the identified parameters, we can objectively detect pathological voices among healthy ones for diagnostic purposes. As a future work, the present approach is an attempt toward identifying acoustic parameters that characterize each voice disorder.
Communication disorders; Acoustic analysis; PRAAT; Classification methods
9.3% of hard science students in Morocco have a serious problem related to voice disorders . Thus, understanding Acoustic features of speech will mainly discriminate objectively normal voices from pathological voices of these voice disorders . The analysis of speech disorders remains essentially clinical, and the instrumental measures are not widespread in clinical practice. The most used are the acoustic and aerodynamic measures . Techniques often used by doctors for diagnosis and symptoms analysis of vocal pathologies, as are invasive endoscopy. However, it is possible to identify pathologies using certain parameters of speech signal. Various conventional techniques  have been used to identify pathological voices (Cepstrum, LPC spectrogram) for extracting certain voice parameters (pitch, formants, jitter, Shimmer...). This paper proposes a model to differentiate normal voices from pathological voices.
A dataset is constructed by recording speech utterances of a set of / a/, /i/ and /u/. The speech signal is then analyzed in order to extract the acoustic parameters such as amplitude, intensity, formant frequencies, bandwidths of formants, jitter, Harmonic to noise ratio…etc.
We aim to examine the abilities of objective acoustic analysis methods to detect pathological voices from students with speech disorders. In order to reach this objective a model will be built to recognize pathological voices among healthy ones.
Written by Paul Boersma and David Weenink at the University of Amsterdam, Praat is a computer program with which you can analyze, synthesize, and manipulate speech. It is available for many different computer operating systems and can be downloaded for free from http://www.praat.org/ [5,6].
Is the size of the vibration, and this determines how loud the sound is. For our experiment, the vowels (/a/, /i/, /u/) were recorded with the levels: high, neutral and low. The magnitude of pressure variation is perceived as volume changes.
Jitter (local, absolute)
The absolute local jitter (in seconds) is the mean absolute (nonnegative) difference of consecutive intervals:
Where Ti is the duration of the ith interval and N is the number of intervals.
(An interval is the time between two consecutive points) .
Bandwidths (B1, B2, B3 and B4)
Bandwidth is a measure of frequency band of a sound, especially a resonance. Bandwidth is determined at the half-power (3 dB down) points of the frequency response curve .
The voice recordings consist of utterances from pathological and healthy speech, recorded by 150 students (80 females, 70 males) with the subjects’ ages ranged from 19 to 23 years old. The database contains phonation of the vowel /a/, /i/ and /u/, with the levels of loudness: neutral, low, high and low_high_low (combined high and low). The recorded files are in wav format, and the sampling frequency is down sampled to 50 KHz (Pratt sampling frequency), within a monochannel. Acoustic analysis was performed with PRAAT Software program. The following 23 parameters were analyzed:
Mean amplitude, energy, mean power, mean pitch (F0), standard deviation, mean F1, mean F2, B1 first bandwidth of F1, B2, B3,B4, all components of jitter ( 5 components), all components of shimmer (6 components) and mean noise to harmonic ratio(NHR). In the proposed method we have used a set of acoustic parameters; which are relevant to describe a pathological voice among healthy ones. After the analysis step, the reduced acoustic parameters are (mean amplitude, the bandwidth B2 of the second formant F2, jitter local absolute and NHR) which are relevant for /a/, /i/ and /u/. A vector of reduced acoustic parameters was constructed as an input of the classification methods. A model was built based on obtained data and simulated on matlab, and a comparison between artificial neural network , KNearest Neighbors, and Hidden Markov Model classification methods was done.
• Input audio of /a/, /i/ and /u/ utterances.
• Feature extraction: all acoustic parameters acquisition Ai for pathological (Pi) and healthy voices (Hi).
• Feature reduction: based on the calculated ratio (Ri)<pathological, healthy>, define the relevant parameters, in the proposed method : mean amplitude, bandwidth B2, jitter local absolute, NHR and HNR are the relevant parameters.
• Ri = Pi / Hi
• Training stage of the vector constructed by relevant acoustic parameters.
• Classification stage: HMM and KNN will be applied in order to have high accuracy of classification of pathological and healthy voices.
• Calculate the degree of the severity of the pathological voice based on the Ri ratio.
Acoustic parameters of /a/, /i/ and /u/ vowels
|Total energy (Pascal2; sec)||0,053||0,062||0,034||0,053||0,040||0,077|
|energy in air (Joule/m2;)||0,00013||0,00016||0,00008||0,00013||0,00010||0,00019|
|(mean pitch F0)Hz||231||146||273||156||277||148|
|(minimum pitch Hz||204||137||244||148||261||141|
|(maximum pitch) Hz||249||158||298||168||241||159|
|(B1) first bandwidth in Hz||139||360||113||73||134||190|
|Jitter (rap): %||0,14||0,42||0,10||0,23||0,16||0,19|
|Jitter (local, absolute): seconds||0,00001||0,00007||0,00001||0,00005||0,00001||0,00004|
|Shimmer (local): %||2,1||4,0||1,8||2,1||2,2||2,8|
|Shimmer (apq3): %||1,0||2,1||0,6||0,9||1,1||1,3|
|Shimmer (dda): %||3,0||6,4||1,9||2,6||3,3||4,0|
|Shimmer (local) indB):||0,19||0,35||0,17||0,18||0,22||0,25|
|Mean noise-to-harmonics ratio:||0,006||0,040||0,254||0,009||0,003||0,007|
Table 1: Acoustic parameters of /a/, /i/, and /u/ vowels for pathological and healthy voices (results extracted from PRAAT software) ( H: Healthy, P:Pathological).
Ratios “pathological”, “healthy” of the acoustic parameters
Based on the ratios, main parameters that characterize pathological voices among healthy ones were extracted.
Acoustic measures of: Mean amplitude, second bandwidth of the second formant, jitter (local absolute), mean NHR, and HNR are more relevant acoustic parameters that characterize a pathological voice among healthy one.
Results of amplitude
We notice, that for all (/a/, /i/, /u/) pathological utterances: Mean amplitude <0. The loudness low, high low has specific result compared to high, normal and neutral loudness .
|Parameter||Ratio((Pathological, Healthy) /a/)||Ratio((Pathological, Healthy) /i/)||Ratio((Pathological, Healthy) /u/)|
|Mean amplitude (pascal)||-0,5||-0,1||-1,3|
|Jitter (local, absolute):seconds||7,0||5,0||4,0|
Table 2: Ratio pathological, healthy for relevant parameters.
|Amplitudes of /a/, /i/and /u/ utterances|
|high_ normal||high_ pathological||low_ normal||low_ pathological||neutral_ normal||neutral_ pathological||low_ high_ low_ normal||low_ high_ low_ pathological|
Table 3: Amplitudes of /a/, /i/and /u/ utterances.
Noise harmonic ratio
As shown in the Figure 1, for /a/ utterance, the percent of noise present in the pathological signal compared with healthy signal is relevant in high loudness.
And this difference will increase if we do not take into account low high low loudness.
Figure 1: % of noise of pathological and healthy utterances of /a/, /i/ and /u/.
Results of NHR
Figure 2: Mean NHR of pathological and healthy /a/.
We notice the effect of low, high, low loudness on the result. However, the difference between pathological and healthy utterance is clear.
Results of Bandwidths
Figure 3: Bandwidths of four first formants of /i/.
Figure 4: Bandwidths of four first formants of /a/.
For /i/ utterance, the most significant bandwidth is the first bandwidth B1 for high loudness. Besides, B4, B3 and B2 respectively for low, neutral and low high low loudness.
For the /a/ utterance B3 is most significant for high and low high low loudness. B4 and B2 are most relevant for normal and neutral loudness (Figure 4).
Figure 5: Bandwidths of four first formants of /u/.
We notice that B4, B3, B3 and B2 are most significant respectively for high, low, neutral and low high low loudness. Based on studied bandwidths, the second bandwidth is considered relevant for all utterances (Figure 5).
Jitter local absolute
Figure 6: Jitter local absolute for /a/ , /i/ and /u/ utterances of pathological and healthy voices.
Significant values are for /a/ utterance related to low and neutral loudness. Also we notice for /i/ and /u/ utterances that low loudness, neutral and low high low loudness have significant values. However, for high loudness, we notice that there is no significant variation between pathological and healthy values (Figure 6).
Classification with KNN
The 4 relevant acoustic parameters were extracted for the input audio signal, and a classification was performed using K-Nearest neighbors’ algorithm (Figure 7):
Figure 7: 6 Classes (/a/, /i/ and /u/ healthy and pathological).
A recognition rate of: 97% was obtained.
Classification with HMM
The audio input will be the set of observation symbol sequences which will be tested against trained HMM classifier. The probability of an observation sequence was calculated using forward algorithm.
Figure 8: Log probability of the transitions states of HMM.
Obtained recognition rate: 95%.
A set of 4 states were used. Low_high_Low vowel affects negatively the HMM recognition (picks on “low_high_low utterances”: a_lhl and i_lhl) (Figure 8) .
A set of acoustic parameters (mean amplitude, jitter local absoluter, NHR, and the bandwidth of the second formant) are sufficient to characterize objectively pathological voices. As a future work, to extend the proposed method to all phonemes in order to build a database of normal/pathological utterances, which can be an initial point to define objectively the symptoms of each voice disorder.