Full Paper View Go Back

A CNN-Based Digraph Extraction Model for Enhanced Swahili Natural Language Processing

Tirus Muya Maina1 , Aaron Mogeni Oirere2 , Stephen Kahara3

Section:Research Paper, Product Type: Journal-Paper
Vol.12 , Issue.6 , pp.43-55, Dec-2024


Online published on Dec 31, 2024


Copyright © Tirus Muya Maina, Aaron Mogeni Oirere, Stephen Kahara . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
 

View this paper at   Google Scholar | DPI Digital Library


XML View     PDF Download

How to Cite this Paper

  • IEEE Citation
  • MLA Citation
  • APA Citation
  • BibTex Citation
  • RIS Citation

IEEE Style Citation: Tirus Muya Maina, Aaron Mogeni Oirere, Stephen Kahara, “A CNN-Based Digraph Extraction Model for Enhanced Swahili Natural Language Processing,” International Journal of Scientific Research in Computer Science and Engineering, Vol.12, Issue.6, pp.43-55, 2024.

MLA Style Citation: Tirus Muya Maina, Aaron Mogeni Oirere, Stephen Kahara "A CNN-Based Digraph Extraction Model for Enhanced Swahili Natural Language Processing." International Journal of Scientific Research in Computer Science and Engineering 12.6 (2024): 43-55.

APA Style Citation: Tirus Muya Maina, Aaron Mogeni Oirere, Stephen Kahara, (2024). A CNN-Based Digraph Extraction Model for Enhanced Swahili Natural Language Processing. International Journal of Scientific Research in Computer Science and Engineering, 12(6), 43-55.

BibTex Style Citation:
@article{Maina_2024,
author = {Tirus Muya Maina, Aaron Mogeni Oirere, Stephen Kahara},
title = {A CNN-Based Digraph Extraction Model for Enhanced Swahili Natural Language Processing},
journal = {International Journal of Scientific Research in Computer Science and Engineering},
issue_date = {12 2024},
volume = {12},
Issue = {6},
month = {12},
year = {2024},
issn = {2347-2693},
pages = {43-55},
url = {https://www.isroset.org/journal/IJSRCSE/full_paper_view.php?paper_id=3720},
publisher = {IJCSE, Indore, INDIA},
}

RIS Style Citation:
TY - JOUR
UR - https://www.isroset.org/journal/IJSRCSE/full_paper_view.php?paper_id=3720
TI - A CNN-Based Digraph Extraction Model for Enhanced Swahili Natural Language Processing
T2 - International Journal of Scientific Research in Computer Science and Engineering
AU - Tirus Muya Maina, Aaron Mogeni Oirere, Stephen Kahara
PY - 2024
DA - 2024/12/31
PB - IJCSE, Indore, INDIA
SP - 43-55
IS - 6
VL - 12
SN - 2347-2693
ER -

27 Views    52 Downloads    4 Downloads
  
  

Abstract :
Swahili, a prominent language in East Africa, is integral to the region`s communication, commerce, and cultural exchange. Enhancing the accuracy of Swahili speech recognition systems is critical for improving accessibility, Transcription, linguistic translation, text analysis, speech recognition, sentiment analysis and aiding individuals with disabilities. However, the unique linguistic features of Swahili, particularly its digraphs, pose significant challenges to existing speech recognition technologies. This study addressed these challenges by introducing a novel approach for the extraction of Swahili digraphs from speech data. The research involved the extraction of a specialized Swahili digraph dataset and the design of an advanced Digraph Extraction Model. This model leverages Dense layer and Convolutional Neural Networks architecture to improve the precision and efficiency of Natural Language Processing tasks related to Swahili. Employing Design Science Research Methodology, the study systematically designs, implements, and evaluates the digraph extraction model. Results from the study demonstrate the model`s robust performance across several metrics. The low Mean Absolute Error and Root Mean Squared Error values indicate that the model is highly accurate, with predictions closely aligning with the actual values. Furthermore, the R-squared value of 0.89 demonstrates that the model effectively captures and this accounts for a substantial part of the variation in the dataset. The low-test loss suggests effective generalization to new, unseen data, affirming the model`s reliability for practical applications. This research significantly advances the field of Swahili speech recognition by enhancing accessibility and usability for Swahili speakers while supporting the preservation of the language`s oral traditions. The innovations introduced, including the annotated Swahili digraph corpus and the advanced Swahili Digraph Extraction Model, provide a substantial foundation for future research and development in Swahili digraph recognition technology. The model`s potential for successful deployment in real-world scenarios offers promising implications for improving Swahili language processing across various applications.

Key-Words / Index Term :
Swahili Digraph Recognition, Digraph Extraction Model, Dense Neural Networks, Convolutional Neural Networks (CNN), Natural Language Processing (NLP), Swahili Corpus

References :
[1] K. Kayabol, “Approximate Sparse Multinomial Logistic Regression for Classification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
[2] J. H. Hansen and G. Liu, “Unsupervised accent classification for deep data fusion of accent and language information,” Speech Communication, vol. 78, pp. 19–33.
[3] S. A. M. Yusof, A. F. Atanda, and H. Husni, “Improving the Performance of Multinomial Logistic Regression in Vowel Recognition by Determining Best Regression Coefficients,” in 2020 International Conference on Decision Aid Sciences and Application (DASA), Sakheer, Bahrain, 2020.
[4] M. Y.-C. Jiang, M. S.-Y. Jong, W. W.-F. Lau, C.-S. Chai, and N. Wu, “Exploring the effects of automatic speech recognition technology on oral accuracy and fluency in a flipped classroom,” Journal of Computer Assisted Learning, vol. 39, no. 1, pp. 125–140, 2023.
[5] M. M. Waqar, M. Aslam, and M. Farhan, “An Intelligent and Interactive Interface to Support Symmetrical Collaborative Educational Writing among Visually Impaired and Sighted Users,” Symmetry, vol. 11, no. 2, p. 238, 2019.
[6] W. H. Finch, J. E. Bolin, and K. Kelley, Multilevel Modeling Using R, United Kingdom: CRC Press/Taylor & Francis Group, 2019.
[7] M. S. Azmi, “Development of Malay Word Pronunciation Application using Vowel Recognition,” International Journal of u- and e-Service, Science and Technology, vol. 9, no. 1, pp. 221–234, 2016.
[8] M. S. Azmi, “Malay Word Pronunciation Test Application for Pre-School Children,” Int Journal of Interactive Digital Media, vol. 4, no. 2, pp. 2289–4098, 2016.
[9] K. Y. Chan and M. D. Hall, “The importance of vowel formant frequencies and proximity in vowel space to the perception of foreign accent,” Journal of Phonetics, vol. 77, p. 100919, 2019.
[10] H. Meutzner, S. Araki, M. Fujimoto, and N. Tomohiro, “A generative-discriminative hybrid approach to multi-channel noise reduction for robust automatic speech recognition,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5740–5744, 2016.
[11] F. Wu, L. P. García-Perera, D. Povey, and S. Khudanpur, “Advances in Automatic Speech Recognition for Child Speech Using Factored Time Delay Neural Network,” in Interspeech, 2019.
[12] S. Ghorbani, S. Khorram, and J. Hansen, “Domain Expansion in DNN-Based Acoustic Models for Robust Speech Recognition,” in 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Singapore, 2019.
[13] C. Shan et al., “Investigating End-to-end Speech Recognition for Mandarin-English Code-switching,” in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019.
[14] X. Li, “Low-Resource Speech Recognition for Thousands of Languages,” Carnegie Mellon University, 2023.
[15] S. Amuda et al., “Engineering Analysis and Recognition of Nigerian English: An Insight into Low Resource Languages,” Transactions on Engineering and Computing Sciences, 2014.
[16] V. Hai, X. Xiao, E. S. Chng, and H. Li, “Cross-lingual phone mapping for large vocabulary speech recognition of under-resourced languages,” IEICE Transactions on Information and Systems, vol. 97, no. 2, pp. 285–295, 2014.
[17] D. A. Gonc¸alves et al., “Facial Expressions Animation in Sign Language based on Spatio-temporal Centroid,” in 22nd International Conference on Enterprise Information Systems, 2020.
[18] M. Mehraj et al., “Automatic Speech Recognition Approach for Diverse Voice Commands,” International Journal of Advanced Research in Computer Science, vol. 8, no. 9, 2017.
[19] G. Korvel et al., “Speech Analytics Based on Machine Learning,” in Machine Learning Paradigms. Intelligent Systems, Springer, Cham, 2019.
[20] F. ?. Asahiah, “Comparison of rule-based and data-driven approaches for syllabification of simple syllable languages and the effect of orthography,” Computer Speech & Language, vol. 70, 2021.
[21] R. Zevallos et al., “Automatic speech recognition of Quechua language using HMM toolkit,” in Annual International Symposium on Information Management and Big Data, pp. 61–68, 2019.
[22] H. Tang et al., “End-to-End Neural Segmental Models for Speech Recognition,” IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 8, 2017.
[23] M. Alam et al., “Survey on Deep Neural Networks in Speech and Vision Systems,” Neurocomputing, vol. 417, pp. 302–321, 2020.
[24] M. Z. Alom et al., “A State-of-the-Art Survey on Deep Learning Theory and Architectures,” Electronics, vol. 8, no. 3, 2019.
[25] J. O. De Sordi, Design Science Research Methodology, Springer International Publishing, 2021.
[26] A. R. Kivaisi, Q. Zhao, and J. T. Mbelwa, “Swahili Speech Dataset Development and Improved Pre-Training Method for Spoken Digit Recognition,” ACM Transactions on Asian and Low-Resource Language Information Processing, 2023.
[27] J. vom Brocke, A. Hevner, and A. Maedche, Introduction to Design Science Research, Springer, Cham, 2020.
[28] T. Yamane, Elementary Sampling Theory, New Jersey: Prentice-Hall, 1967.
[29] S. K. Daroch and P. Singh, “An Analysis of Various Text Segmentation Approaches,” in Proceedings of International Conference on Intelligent Cyber-Physical Systems. Algorithms for Intelligent Systems, Singapore, 2022.
[30] M. K. Najm et al., “Text Classification Accuracy Enhancement Using Deep Neural Networks,” in 2023 Al-Sadiq International Conference on Communication and Information Technology (AICCIT), Al-Muthana, Iraq, 2023.
[31] T. M. Maina, “The Swahili Digraph Corpus,” Mendeley Data, vol. 2, 2024.
[32] N. Colegrave and D. R. Graeme, Power Analysis: An Introduction for the Life Sciences, Oxford University Press, 2021.
[33] J. Cohen, Statistical Power Analysis for the Behavioral Sciences, Taylor & Francis, 2013.
[34] S. Sarma and N. Pathak, “Design and Implementation of an Assamese Language Chatbot Using,” International Journal of Scientific Research in Computer Science and Engineering, vol. 11, no. 6, pp. 13–18, 2023.
[35] Deepanshu et al., “Convolutional Neural Network-Based Automated Acute Lymphoblastic Leukemia Detection and Stage Classification from Peripheral Blood Smear Images,” International Journal of Scientific Research in Computer Science and Engineering, vol. 12, no. 3, pp. 21–28, 2024.

Authorization Required

 

You do not have rights to view the full text article.
Please contact administration for subscription to Journal or individual article.
Mail us at  support@isroset.org or view contact page for more details.

Go to Navigation