Improvement of the Speaker Verification System with Feature Level and Score Level Normalization Techniques
|Kshirod Sarmah1, Utpal Bhattacharjee2
|Related article at Pubmed, Scholar Google|
The performance of a text independent Speaker verification (SV) system has degraded when speaker model training is done in one environment while the testing is done in another, due to mismatching of phonetic contents of speech utterances, recording environment, session variability and sensor variability of training and testing criteria, which are major problems in speaker verification system. The robustness of SV system has been improved by applying different Voice Activity Detection (VAD) techniques like Cepstral Mean Normalization (CMN), Cepstral Variance Normalization (CVN) techniques in features level and score normalization techniques in score level. In this paper we report the experiment carried out on the recently collected speaker recognition database Arunachali Language Speech Database (ALS-DB). The collected database is evaluated with Gaussian mixture model and Universal Background Model (GMM-UBM) and Mel- Frequency Cepstral Coefficients (MFCC) with its first and second order derivatives as well as Prosodic features as a front end feature vectors. The performance of the speaker verification system has been improved by applying CVN at the feature level as well as score normalization technique Test-normalization (T-Norm) in the score level. And also we observe that the performance of SV system vastly improved while applying CVN in feature level and T-Norm in score level at the same time. We observe that combining MFCC with Prosodic features improved the performance of the SV system with 7.08%, while T-Norm improved the SV system with 3.22% and CVN has improved with 3.90%.