Auditory and Acoustic Features from Clue-Words Sets for Forensic Speaker Identification and its Correlation with Probability Scales

An experiment carried out on speaker identification by semi-automatic measurement of parameters with the goal of collaborating numerical data as well as descriptive data with that of probability scales. The 15 sets of speech samples of 15 speakers selected randomly from 100 actual crime cases, in Hindi utterances for purpose of speaker identification test subjected to spectrographic analysis. Speaker specific acoustic parameters, namely 1st formant frequency at a particular location (F1); 2nd formant frequency at a particular location (F2); and 3rd formant frequency at a particular location (F3) measured for the set of speech samples for all the 15 speakers. Also, the auditory analysis based on the linguistic features and phonetic features noted of each of the 15 sets of speech samples. We developed software to calculate the similarity percentage for the numerical data measured on the basis of acoustic analysis and numerical values assigned to auditory parameters on the basis of auditory analysis computed according to one of the nine probability scales. Most of the existing methods take only acoustic features to obtain numerical results for the purposes of speaker identification.


Introduction
The fast increasing development and penetration of communication technology surely helped humankind in better, accessible and efficient communication but it is not without its ill consequences. Information and Communication technology (ICT) has also helped anti-social elements in committing more organized and white collar crimes, and in turn, law enforcement agencies should be better equipped with advanced technology to counter or deal with such crimes. Speaker Identification technology is one of the many tools which our law enforcement agencies could rely upon and it is also a popular identification technique used for monitoring and authenticating human subjects using their speech signal. But the question often arises about how to involve auditory parameters in numerical terms in the results of speaker identification [1,2]. Kersta [3] postulated that spectrograms of the given individual were unique, individualistic and permanent. Kinston, et al. [4] discussed the various aspects of the utilization of statistics in criminalistics. Tosi, et al. [5] in one of their experiment professed that the differences between inter-speaker and intra-speaker variability, which according to them, stems mainly from the anatomical differences in vocal tract but these did not correlate or quantified with specific acoustic parameters. Koenig [6] in 1986 conducted a survey with FBI to determine the error rate of spectrographic identification of voice and compared spectral pattern of two voice samples by comparing formant shaping, beginning, mean and end formant frequency, timing, pitch etc. of each and every individual word [7]. Meuwly et al. [8] used likelihood ratio to suggest the worth of evidence of questioned recording and measured that how this recording scores for suspected speaker models when compared to relevant non-suspected speaker models. Kinoshita [9] investigates about the possibility to perform forensic speaker identification under forensically realistic data, with traditional acoustic parameters [10] and Bayesian Likelihood Ratio and concluded that the likelihood ratio based discriminant test was found to be one of the effective ways in evaluating speech data. Aitken [11] used Bayesian approach to combine objective and subjective probabilities in one formula. Singh [12] in 2005 studied isolated spoken words with similar vowel quality as syllabic nuclei preceded by consonants of similar place of articulation as clue-words for forensic importance. As published by Singh [13] in 2007 in "An Introduction to Forensic Speaker Identification Procedure", the verbal probability scales are positive identification, identification with high probability, probable identification, possible identification, no opinion, possible elimination, probable elimination, elimination with high probability elimination and positive elimination. Becker, at al. [14] applied UBM-GMM verification system to semi-automatically extracted formant features and concluded two important advantages: • Dimensionality of feature vector space was small and, • There was direct relationship between the configuration of vocal tract not only as an average, but also with the speaker-specific variations which were expressed in the entire distribution.
Nowadays, mostly numerical data obtained with the help of acoustic features is computed against the probability scales; surely, it is of immense help to the courts of law. The scope of the present study is to combine the acoustic features with that of the auditory parameters so that one new paradigm may be established and the law enforcement agencies may do away with separately looking subjective and the objective data. In this paper, an experiment of 15 actual crime cases conducted on speaker identification by using semi-automatic method of measurement of acoustic parameters [15].

Experimental Methods
Sampling of speech material: A set of clue-words collected from the database of clue-words of actual crime samples accessed from State Forensic Laboratories of Haryana and Delhi (India). These sets of cluewords were collected without noting the particulars of the cases. Cluewords are the words that are similar having same vowel quality selected from questioned as well as specimen speech sample from the set of verbatim/non-verbatim words. The set of clue-words of a speaker consists of different vowels, namely, /ʌ/, /ɛ/, /ɑ/, /ӕ/, /a/, /u/, /i/, /ͻ/, / o/, /e/ & /ә/ preceded and succeeded by different consonants uttered at similar places of articulation. The selection of sets of clue-words for an informant should be in such a manner that the vowels as a nuclei must be present in different words. The clue-words selected specifically, used to study various acoustic, linguistic and phonetic features from contextual text. These sets of clue-words from questioned and the specimen speech samples for speakers selected from among the randomly selected 15 informants (accused/ suspect/ complainant) in Hindi/ English/ Punjabi utterances having male and female ratio as 12:3 in varying age groups from 15-60 years mostly taken from the northern parts of India. The questioned and specimen speech samples considered for this study for the purposes of speaker identification taken on one-to-one basis in the closed sets with options of acceptance and rejection. These sets of speech samples selected for speakers whose questioned recording had been made through mobile network, landline and also from direct recording. The speech samples of questioned as well as specimen samples digitised at sampling rate of 22050 Hz and 16 bit quantization in mono signed.

Experiment
The sets of clue-words of 15 speakers for questioned as well as of specimen speech samples subjected to spectrographic analysis using computerized speech lab (CSL). The formant frequencies (F1, F2 and F3) were analyzed using LPC (Linear predictive Coding) at a particular location of vowel nuclei. The result of this analysis found to be tabulated for one speaker in Table 1. The linguistic and phonetic features tabulated for one speaker as in Table 2. The auditory analysis contributed the phonetic and linguistic features of the speaker's specific characteristic and spectrographic analysis provided the acoustic features of the speaker's utterances. These features subjected to the software specifically developed for statistical evaluation and a correlation of phonetic, linguistic and acoustic events to that of probability scales used in the procedure for speaker identification test [16].    The software for statistical evaluation and correlation of verbal probability scales designed in such a manner that the auditory characteristics and acoustic features got combined in statistical operation using Bayes method i.e. after taking values of formant frequencies (F1, F2 and F3) i.e. acoustic features from questioned as well as from the specimen speech samples and comparing them for similarities only, a similarity percentage calculated and the same thing done in the case of linguistic and phonetic features [17]. These numerical results obtained from objective as well as subjective data got combined by providing different weightages to each of them in such a manner that objective data gets more credential in comparison to the subjective one and then computed against one of the verbal probabilities scales according to the criteria.  Weightage values assigned to the linguistic and phonetic features also determined and percentage of similarity/dissimilarity based on that value calculated using the software modules. Then the combination of objective and subjective probabilities of acoustic and auditory parameters calculated.

Results and Discussions
The results of all the percentages tabulated in Table 3. Similarity percentage of more than 90% having more than 3 usable formants and more than 20 matching clue-words/word segments yields positive identification. Whereas, similarity percentage of more than 80% having 2 or more usable formants and more than 15 matching cluewords/word segment produces probable identification. In the result, tabulated in Table 3, 12 out of 15 produces probable identification and 3 out of 15 yields positive identification [19]. Figure 3 shows the output of the software for one speaker. The probability scale derived verbally as per the criteria based on number of clue-words, number of formants and the percentage of matching of both acoustic and auditory features used in most of the methods of the speaker identification tests by semi-automatic measurement of acoustic parameters [20,21].   auditory features and the acoustic parameters is found promising and robust in the sense that the correlation of statistical probability by a combination of objective and subjective probabilities provided the percentage of similarity. May the threshold value is possible to be fixed for percentage of similarity on the basis of the results of the combined objective and subjective probabilities obtained from this study for the larger database.

Conclusion
New methods like automatic speaker identification, text independent speaker identification found to be developed in the recent past. However, these methods of automatic speaker identification did not achieve the desired acceptability in/by scientific community for application in criminal justice system. The semi-automatic measurement of acoustic parameters combined with auditory features continues to be an acceptable technique for speaker identification by experts. The decision drawn based on the criteria of the number of clue-words, number of formants and percentage of matching conclude the verbal probability scales.
The result of the present study is promising as well as its importance lies in for deriving a numerical probability by combining the objective with the subjective probabilities. Combining these probabilities and its presentation in the courts of law will be more feasible than verbal probability. Extending this study with all the 100 speakers will be helpful in coming out with a threshold value of probability in the verbal probability scales.