Abstract
In this paper a brief comparison studies on the performance of different speaker modeling techniques in robust and reliable speaker verification (SV) system has been discussed. In text-independent speaker verification, lots of states of art speaker modeling techniques have been developed in different scenarios to upgrade its performance. The performance of SV system is not only depended on the fusion of different feature vectors but also it is highly depended upon the fusion of various speaker modeling techniques. In this work, an automatic SV system has been developed using the Mel-Frequency Cepstral Coefficients (MFCC) combined with the Prosodic feature vectors. The baseline of the SV system has been trained with speaker modeling techniques separately and fusions namely Vector Quantization (VQ), Gaussian Mixture Model (GMM), GMM-Universal Background Model (GMM-UBM), Support Vector Machine (SVM) and Joint Factor Analysis (JFA) to analyze its performances. The results reported here, have been evaluated using the multilingual speech database, namely Arunachali Language Speech Database (ALS-DB). From the experimental point of view we observe that the best performance of SV system shows by JFA with GMM-UBM modeling technique with its EER value of 4.76% and MinDCF value of 0.0872. Comparing with other modeling techniques VQ shows its poor performance with its EER value of 11.08% and MinDCF value of 0.2010. SVM shows of approximately 2.8% improvement of verification rate with comparison to that of GMM-UBM. Here, finally, we conclude that the fusions of both generative and discriminative models highly improve the performance of SV system.
Key-Words / Index Term
Speaker Verification,MFCC,Prosodic,GMM-UBM,SVM,JFA
References
[1] F. Bimbot, et. al., “A tutorial on text-independent speaker verification,” EURASIP Journ. on Applied Signal Processing, pp. 430-451, 2004.
[2] D.A.Reynolds, “An overview of automatic speaker recognition technology”,. In: ICASSP, IEEE international conference on acoustics, speech and signal processing, vol 4, pp 4072–4075, 2002.
[3] L. Mary and B.Yegnanarayana, “Extraction and representation of prosodic features for language and speaker recognition”,Speech communication,pp.782-796, 2008.
[4] D.A.Reynolds, T.F.Quateri and R.B. Dunn, “Speaker verification using adapted Gaussian mixture models”, In Digital Signal Processing, Vol.10, pp.19-41, 2000.
[5] T.Kinnunen and H. Li, “ An overview of Text-independent Speaker Recognition: from Features to Supervectors”,Speech Communication,pp. 12-40, 2010
[6] K. Sarmah and U. Bhattacharjee, “Speaker Modeling Distance Normalization Technique in Multilingual Speaker Verification”, International Journal of Electrical and Electronics Engineering Research, Vol.3, Issue-2, pp.319-326, 2013.
[7] K. Sarmah and U. Bhattacharjee, “Improvement of Speaker Verification System with Feature Level and Score Level Normalization Techniques”, International Journal of Innovative Research in Computer and Communication Engineering (IJIRCCE), Vol.2, Issue 2, pp. 3119-3126, 2014.
[8] K. Sarmah and U. Bhattacharjee, “Text-independent multi-sensor speaker verification system”, International Journal of Computer Science and Engineering, Vol. 4, Issue 5,pp.7-16, 2015.
[9] D.A. Reynolds, “Speaker identification and verification using Gaussian mixture speaker models,” Speech Communication, vol.17, pp. 91-108,1995.
[10] N.Malayath, , H. Hermansky, S.Kajarekar, and B.Yegananarayan, “Data –driven temporal filters and alternatives to GMM in speaker verification”, In Digital Signal Processing, pp.55-74, 2000.
[11] D.A.Reynolds, “Gaussian Mixture Models”. In Encyclopedia of Biometric Recognition, Springer, Journal Article, 2008.
[12] A. Fazel, and S.Chakrabartty, “An overview of Statistical Pattern Recognition Techniques for Speaker Verification”,. In IEEE CIRCUITS AND SYSTEMS MAGAZINE. 2011.
[13] J. Pelecanos, R.Vogt and S.Sridharan, “A study on standard and iterative MAP adaptation for speaker recognition”,. In Proceeding on the 9th Australian International Conference on Speech Science & Technology Melbourne, December 2 to 5, 2002.
[14] W. Campbell, J. Campbell, D.A. Reynolds, E. Singer, and P.Torres-Carrasquillo, “Support vector machines for speaker and language recognition”,. Computer Speech and Language 20, pp.210–229, 2006.
[15] P.Kenny, “Joint factor analysis of speaker and session variability: Theory and algorithms, Tech. Report CRIM-06/08-13, 2005.
[16] P. Kenny and P.Dumouchel, “Experiments in speaker verification using factor analysis likelihood ratios,” in Proc. Odyssey04, pp. 219-226, 2004.
[17] U. Bhattacharjee and K. Sarmah, “A Multilingual Speech Database for Speaker Recognition”, Proc. IEEE, ISPCC, 2012.
[18] U. Bhattacharjee and K. Sarmah, “GMM-UBM Based Speaker Verification in Multilingual Environments”,International Journal of Computer Science Issues.Vol. 9,Issue 6,No.2, pp.373-380,2012.
[19] U. Bhattacharjee and K. Sarmah, “Development of a Speech Corpus for Speaker Verification Research in Multilingual Environment”,International Journal of Soft Computing and Engineering. Vol.2, Issue-6, pp. 443-446, 2013.
[20] D.A.Reynolds, et..al, “The SuperSID project: exploiting high-level information for high-accuracy speaker recognition”. In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2003 (Hong Kong, China, pp. 784–787, April 2003.