Bridging Swahili Communication Gaps: Real-Time Audio-to-Text Sentiment Analysis via Pre-trained NLP

Kevin Obote; Benjamin Kikwai; Kennedy Senagi; Joyce Njiiri; John Olukuru; Joseph Sevilla

doi:doi:10.11648/j.ajai.20250902.18

Research Article |

| Peer-Reviewed

Bridging Swahili Communication Gaps: Real-Time Audio-to-Text Sentiment Analysis via Pre-trained NLP

Kevin Obote^*

, Benjamin Kikwai

, Kennedy Senagi

, Joyce Njiiri

, John Olukuru

, Joseph Sevilla

Published in American Journal of Artificial Intelligence (Volume 9, Issue 2)

Received: 4 August 2025 Accepted: 18 August 2025 Published: 25 September 2025

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

The global proliferation of digital communication highlights a critical gap in language technologies for digitally under-represented languages, particularly Kiswahili, a language spoken by over 100 million people. While significant advancements have been made in natural language processing (NLP) for high-resource languages like English, a persistent challenge remains in creating robust computational systems for low-resource linguistic contexts. This study addresses this challenge by presenting a novel, end-to-end Kiswahili audio processing pipeline that unifies three core capabilities; real-time speech recognition, sentiment analysis, and text summarization. The system’s novelty lies in its strategic leverage of state-of-the-art, pre-trained machine learning models, including Wav2vec2, DistilBERT, and T5, demonstrating a viable approach to bridging the digital communication gap for Kiswahili in real-world applications. Our methodology involved a rigorous evaluation of the integrated system using the Mozilla Common Voice Corpus. The results revealed key insights and promising performance metrics. The speech recognition component, a foundational element of the pipeline, achieved an exceptionally low Word Error Rate (WER) of 0.3329 with the Wav2vec2 model, highlighting its capacity for accurate transcription in a low-resource setting. This is a significant finding, as it suggests that models specifically fine-tuned for such environments can overcome the challenges of data scarcity and linguistic diversity. The summarization component also demonstrated strong capabilities, yielding a ROUGE-L score of 0.6622, which indicates robust semantic and structural alignment with reference texts. While the sentiment analysis revealed a notable data imbalance with a predominance of negative samples, the model achieved a 60% accuracy, demonstrating its potential for further refinement. These findings underscore both the immense potential and the inherent limitations of applying pre-trained models to a low-resource language like Kiswahili. They provide a compelling proof of concept for the technical feasibility of Kiswahili audio processing and emphasize the critical need for continued investment in dataset expansion and model optimization. The study concludes that this work establishes a foundational groundwork for continued research and the subsequent development of advanced NLP tools specifically tailored for Kiswahili-speaking populations, ultimately aiming to improve access to education, healthcare, and information services, and to foster greater digital inclusion throughout East Africa.

Published in	American Journal of Artificial Intelligence (Volume 9, Issue 2)
DOI	10.11648/j.ajai.20250902.18
Page(s)	167-185
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2025. Published by Science Publishing Group

Keywords

Automatic Speech Recognition, Natural Language Processing, Kiswahili

References

[1]	L. J. Gorenflo and S. Romaine, “Linguistic diversity and conservation opportunities at UNESCO World Heritage Sites in Africa,” Conservation Biology, vol. 35, no. 5, pp. 1426-1436, 2021.
[2]	Y. Shevchenko, “Language-in-Education Policy and Academic Performance of Students: the Case of Tanzania,” African Journal of Economics Politics and Social Studies, vol. 2, no. 2, pp. 41-48, January 2023.
[3]	P. Chonka, S. Diepeveen, and Y. Haile, “Algorithmic power and African indigenous languages: search engine autocomplete and the global multilingual Internet,” Media Culture & Society, vol. 45, no. 2, pp. 246-265, June 2022.
[4]	F. M. Mwithi, “Indigenising Facebook language: Use of local languages in Facebook communication among a selected group of Kenyan internet users,” Editon Consortium Journal of Literature and Linguistic Studies, vol. 5, no. 1, pp. 268-281, September 2023.
[5]	E. Ombui, L. Muchemi, and P. Wagacha, “Psychosocial Features for Identifying Hate Speech in Social Media Text,” Journal of Education, Society and Behavioural Science, vol. 34, no. 12, pp. 32-51, December 2021.
[6]	L. Schelenz and K. Schopp, “Digitalization in Africa: Interdisciplinary perspectives on technology, development, and justice,” International Journal for Digital Society, vol. 9, no. 4, pp. 1412-1420, December 2018.
[7]	D. Mhlanga and E. Ndhlovu, “Digital technology adoption in the agriculture sector: Challenges and complexities in Africa,” Human Behavior and Emerging Technologies, vol. 2023, pp. 1-10, September 2023.
[8]	P. C. Alex, “LINGUISTIC REVITALISATION AND THE DRAMA IN AFRICAN INDIGENOUS LANGUAGES,” UC Journal ELT Linguistics and Literature Journal, vol. 3, no. 2, pp. 156-172, December 2022.
[9]	N. Isern and J. Fort, “Assessing the importance of cultural diffusion in the Bantu spread into southeastern Africa,” PLoS ONE, vol. 14, no. 5, p. e0215573, May 2019.
[10]	T. J. Pemberton, M. DeGiorgio, and N. A. Rosenberg, “Population structure in a comprehensive genomic data set on human microsatellite variation,” G3 Genes Genomes Genetics, vol. 3, no. 5, pp. 891-907, March 2013.
[11]	B. Dobon et al., “The genetics of East African populations: a Nilo-Saharan component in the African genetic landscape,” Scientific Reports, vol. 5, no. 1, p. 9996, May 2015.
[12]	O. A. Ezugwu and M. M. Moses, “African Continental Free Trade Area Agreement (AFCFTA) and the challenges of regional integration in Africa,” International Journal of Social Service and Research, vol. 3, no. 10, pp. 2711-2720, October 2023.
[13]	M. T. Bala, “The challenges and prospects for regional and economic integration in West Africa,” Asian Social Science, vol. 13, no. 5, p. 24, April 2017.
[14]	J. K. Pickrell et al., “The genetic prehistory of southern Africa,” Nature Communications, vol. 3, no. 1, p. 2140, October 2012.
[15]	F. Banda, “Language policy and orthographic harmonization across linguistic, ethnic and national boundaries in Southern Africa,” Language Policy, vol. 15, no. 3, pp. 257-275, July 2015.
[16]	O. Hatahet, F. Roser, and M. L. Seghier, “Cognitive decline assessment in speakers of understudied languages,” Alzheimer S & Dementia Translational Research & Clinical Interventions, vol. 9, no. 4, October 2023.
[17]	S. Hu et al., “Natural language processing technologies for public health in africa: scoping review,” Journal of Medical Internet Research, vol. 27, p. e68720, 2025.
[18]	F. Banda, “Critical perspectives on language planning and policy in Africa: Accounting for the notion of multilingualism,” Stellenbosch Papers in Linguistics Plus, vol. 38, no. 0, May 2012.
[19]	C. S. Shikali, Z. Sijie, Q. Liu, and R. Mokhosi, “Better word representation vectors using Syllabic Alphabet: A case study of Swahili,” Applied Sciences, vol. 9, no. 18, p. 3648, September 2019.
[20]	E. I. Gebremeskel and M. E. Ibrahim, “Y-chromosome E haplogroups: their distribution and implication to the origin of Afro-Asiatic languages and pastoralism,” European Journal of Human Genetics, vol. 22, no. 12, pp. 1387-1392, March 2014.
[21]	L. B. Scheinfeldt, S. Soi, and S. A. Tishkoff, “Working toward a synthesis of archaeological, linguistic, and genetic data for inferring African population history,” Proceedings of the National Academy of Sciences, vol. 107, no. supplement 2, pp. 8931-8938, May 2010.
[22]	S. Mukherjee, “Universalising aspirations: Community and social service in the Isma’ili imagination in Twentieth-Century South Asia and East Africa,” Journal of the Royal Asiatic Society, vol. 24, no. 3, pp. 435-453, May 2014.
[23]	M. C. Njoroge and M. G. Gathigia, “The treatment of Indigenous Languages in Kenya’s Pre- and Post- independent Education Commissions and in the Constitution of 2010,” Advances in Language and Literary Studies, vol. 8, no. 6, p. 76, December 2017.
[24]	J. Qiu et al., “Utilization of healthcare services among Chinese migrants in Kenya: a qualitative study,” BMC HealthServicesResearch, vol. 19, no. 1, December2019.
[25]	K. Wylie, L. McAllister, B. Davidson, and J. Marshall, “Communication rehabilitation in sub-Saharan Africa: A workforce profile of speech and language therapists,” African Journal of Disability, vol. 5, no. 1, February 2016.
[26]	S. Ngcobo, M. A. Makumane, and P. Masala, “THE LINGUISTIC RECONSTRUCTION OF POST-COLONIAL SOUTH AFRICA AND LESOTHO: ENGLISH DOMINANCE DILEMMA,” JOURNAL OF SOCIAL SCIENCES, vol. 6, no. 4, pp. 90-101, January 2024.
[27]	L. K. Kiramba, “Heteroglossic practices in a multilingual science classroom,” International Journal of Bilingual Education and Bilingualism, vol. 22, no. 4, pp. 445-458, December 2016.
[28]	C. S. Nganga, G. Maroko, and A. N. Ong’onda, “Teachers’ interpretation and application of language policy guidelines in Kenya,” Multilingual Academic Journal of Education and Social Sciences, vol. 11, no. 1, December 2023.
[29]	J. K. Dhillon and J. Wanjiru, “Challenges and strategies for teachers and learners of English as a second language: the case of an urban primary school in Kenya,” International Journal of English Linguistics, vol. 3, no. 2, March 2013.
[30]	N. Ngetich, “Language in education policy in Kenya,” Journal of Linguistics Literary and Communication Studies, vol. 1, no. 1, pp. 1-8, February 2022.
[31]	N. A. S. Abdullah and N. I. A. Rusli, “Multilingual Sentiment Analysis: A Systematic Literature review,” Pertanika Journal of Science & Technology, vol. 29, no. 1, January 2021.
[32]	T. Shaik et al., “A review of the trends and challenges in adopting natural language processing methods for education feedback analysis,” Ieee Access, vol. 10, pp. 56720-56739, 2022.
[33]	E. Q. Chinedu et al., “Unraveling Emotions: Contemporary Approaches in Sentiment Analysis,” Journal of Sensor Networks and Data Communications, vol. 3, no. 1, pp. 223-230, December 2023.
[34]	L. Zhang, S. Wang, and B. Liu, “Deep learning for sentiment analysis: A survey,” Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery, vol. 8, no. 4, March 2018.
[35]	C. Marshall et al., “Using natural language processing to explore mental health insights from UK tweets during the COVID-19 pandemic: Infodemiology study,” JMIR Infodemiology, vol. 2, no. 1, p. e32449, January 2022.
[36]	P. Jawale et al., “Sentiment analysis for financial markets,” International Journal for Research in Applied Science and Engineering Technology, vol. 11, no. 12, pp. 535-541, December 2023.
[37]	M. B. Mustafa et al., “Code-switching in automatic speech recognition: the issues and future directions,” Applied Sciences, vol. 12, no. 19, p. 9541, 2022.
[38]	J. Cho et al., “Multilingual sequence-to-sequence speech recognition: Architecture, transfer learning, and language modeling,” in 2018 IEEE Spoken Language Technology Workshop (SLT), 2018, pp. 521-527.
[39]	P. Wang et al., “Lamassu: streaming language-agnostic multilingual speech recognition and translation using neural transducers,” 2022. https://doi.org/10.48550/arxiv.2211.02809
[40]	S. Tong, P. N. Garner, and H. Bourlard, “An investigation of deep neural networks for multilingual speech recognition training and adaptation,” in Interspeech 2017, 2017, pp. 714-718.
[41]	S. Zhou, Y. Zhao, S. Xu, and B. Xu, “Multilingual recurrent neural networks with residual learning for low-resource speech recognition,” in Interspeech 2017, 2017, pp. 704-708.
[42]	H. Le et al., “Dual-decoder transformer for joint automatic speech recognition and multilingual speech translation,” in Proceedings of the 28th International Conference on Computational Linguistics, 2020, pp. 3520-3533.
[43]	R. Liu, Y. Shi, C. Ji, and M. Jia, “A Survey of Sentiment Analysis Based on Transfer Learning,” IEEE Access, vol. 7, pp. 85401-85412, 2019.
[44]	T. Young, D. Hazarika, S. Poria, and E. Cambria, “Recent Trends in Deep Learning Based Natural Language Processing [Review Article],” IEEE Computational Intelligence Magazine, vol. 13, no. 3, pp. 55-75, 2018.
[45]	M. Parmar, B. Maturi, J. M. Dutt, and H. Phate, “Sentiment Analysis on Interview Transcripts: An application of NLP for Quantitative Analysis,” in 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2018, pp. 1063-1068.
[46]	M. K. Ranjan, S. K. Tiwari, and A. M. Sattar, “Automatic Real Time sentiment Analysis of Online Shopping Application: Generic Model,” Authorea (Authorea), October 2023. https://doi.org/10.22541/au.169632975.54834800/v1
[47]	S. S. Chung and D. Aring, “Integrated Real-Time Big Data Stream Sentiment Analysis Service,” Journal of Data Analysis and Information Processing, vol. 06, no. 02, pp. 46-66, January 2018.
[48]	A. Zadeh, M. Chen, S. Poria, E. Cambria, and L.-P. Morency, “Tensor fusion network for multimodal sentiment analysis,” arXiv preprint arXiv:1707.07250, 2017.
[49]	G. Meena, K. K. Mohbey, and A. Indian, “Categorizing sentiment polarities in social networks data using convolutional neural network,” SN Computer Science, vol. 3, no. 2, December 2021.
[50]	L. Wang et al., “Boosting Delirium Identification Accuracy with Sentiment-Based Natural Language Processing: Mixed Methods Study,” JMIR Medical Informatics, vol. 10, no. 12, p. e38161, September 2022.
[51]	A. Barunaha, M. R. Prakash, and R. Naresh, “Real-Time Sentiment Analysis of Social Media Content for Brand Improvement and Topic Tracking,” in Proceedings of the 6th International Conference on Intelligent Computing (ICIC-6 2023), 2023, pp. 26-31.
[52]	O. Gratz, D. Vos, M. Burke, and N. Soares, “Assessment of agreement between human ratings and Lexicon-Based sentiment ratings of Open-Ended responses on a behavioral rating scale,” Assessment, vol. 29, no. 5, pp. 1075-1085, March 2021.
[53]	S. Poria et al., “Context-Dependent Sentiment Analysis in User-Generated Videos,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume1: LongPapers), Vancouver, Canada, 2017, pp. 873-883.
[54]	B. R. Chakravarthi et al., “Overview of the track on Sentiment Analysis for Dravidian Languages in Code-Mixed Text,” Forum for Information Retrieval Evaluation, pp. 21-24, December 2020. https://doi.org/10.1145/3441501.3441515
[55]	R. Bajpai, D. Ho, and E. Cambria, “Developing a Concept-Level Knowledge Base for Sentiment Analysis in Singlish,” in Computational Linguistics and Intelligent Text Processing, Cham: Springer International Publishing, 2018, pp. 347-361.
[56]	A. de Arriba, M. Oriol, and X. Franch, “Applying Transfer Learning to Sentiment Analysis in Social Media,” in 2021 IEEE 29th International Requirements Engineering Conference Workshops (REW), 2021, pp. 342-348.
[57]	A. Hussain et al., “Artificial Intelligence-Enabled Analysis of Public Attitudes on Facebook and Twitter Toward COVID-19 Vaccines in the United Kingdom and the United States: Observational Study,” Journal of Medical Internet Research, vol. 23, no. 4, p. e26627, 2021.
[58]	M. Wankhade, A. C. S. Rao, and C. Kulkarni, “A survey on sentiment analysis methods, applications, and challenges,” Artificial Intelligence Review, vol. 55, no. 7, pp. 5731-5780, February 2022.
[59]	Mozilla Foundation, “Common Voice Spontaneous Speech Collection,” 2024. [Online]. Available: https://foundation.mozilla.org/en/blog/common-voice-spontaneous-speech/
[60]	M. Müller, “Musically Informed Audio Decomposition,” in Fundamentals of Music Processing: Audio, Analysis, Algorithms, Applications. Cham: Springer International Publishing, 2015, pp. 415-480.
[61]	Jiaaro, “pydub,” 2025. [Online]. Available: https://pydub.com/ [Accessed: 2025-01-08].
[62]	NLTK, “Natural Language Toolkit,” 2025. [Online]. Available: https://www.nltk.org/ [Accessed: 2025-01-08].
[63]	J. R. Jim et al., “Recent advancements and challenges of NLP-basedsentimentanalysis: Astate-of-the-artreview,” Natural Language Processing Journal, vol. 6, p. 100059, 2024.
[64]	S. Takamichi et al., “JSUT and JVS: Free Japanese Voice Corpora for Accelerating Speech Synthesis Research,” AcousticalScienceandTechnology, vol. 41, p. 761, 2020.
[65]	Eddiegulay, “wav2vec2-large-xlsr-mvc-swahili,” 2021. [Online]. Available: https://huggingface.co/eddiegulay/wav2vec2-large-xlsr-mvc-swahili [Accessed: Hugging Face].
[66]	A. Baevski, H. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations,” arXiv preprint arXiv: 2006.11477, June 2020.
[67]	T. Wolf et al., “HuggingFace’s Transformers: State-of-the-art Natural Language Processing,” October 2019. [Online]. Available: https://arxiv.org/abs/1910.03771 [Accessed: arXiv preprint arXiv: 1910.03771].
[68]	S. Dhahbi, N. Saleem, S. Bourouis, M. Berrima, and E. Verdú, “End-to-end Neural Automatic Speech Recognition System for Low Resource Languages,” Egyptian Informatics Journal, vol. 29, p. 100615, January 2025.
[69]	B. Li, X. Liu, and R. Zhang, “Employing the bert model for sentiment analysis of online commentary,” Applied and Computational Engineering, vol. 32, no. 1, pp. 241-247, 2024.
[70]	S. Kumar, S. Deep, and P. Kalra, “Enhancing customer service in banking with ai: intent classification using distilbert,” International Journal of Current Science Research and Review, vol. 07, no. 05, 2024.
[71]	Y. Wu, Z. Jin, C. Shi, P. Liang, and T. Zhan, “Research on the application of deep learning-based bert model in sentiment analysis,” Applied and Computational Engineering, vol. 71, no. 1, pp. 14-20, 2024.
[72]	T. Wang, K. Lu, K. P. Chow, and Q. Zhu, “Covid-19 sensing: negative sentiment analysis on social media in china via bert model,” IEEE Access, vol. 8, pp. 138162-138169, 2020.
[73]	A. Bodor, M. Hnida, and N. Daoudi, “Machine Learning Models Monitoring in MLOps Context: Metrics and Tools,” International Journal of Interactive Mobile Technologies (iJIM), vol. 17, no. 23, 2023.
[74]	D. Kreuzberger, N. Kühl, and S. Hirschl, “Machine Learning Operations (MLOPs): overview, definition, and architecture,” IEEE Access, vol. 11, pp. 31866-31879, January 2023.
[75]	W. Slam, Y. Li, and N. Urouvas, “Frontier Research on Low-Resource Speech Recognition Technology,” Sensors, vol. 23, no. 22, p. 9096, November 2023.
[76]	G. S. Mirishkar, A. Yadavalli, and A. Vuppala, “An Investigation of Hybrid architectures for Low Resource Multilingual Speech Recognition system in Indian context,” 2021. [Online]. Available: https://www.semanticscholar.org/paper/An-Investigation-of-Hybrid-architectures-for-Low-in-Mirishkar-Yadavalli/c638b9c146958184f0017309db6879ef6a1ce9b9
[77]	C. Yu et al., “Acoustic Modeling Based on Deep Learning for Low-Resource Speech Recognition: An Overview,” IEEE Access, vol. 8, pp. 163829-163843, 2020.
[78]	Z. Dong, Q. Ding, W. Zhai, and M. Zhou, “A speech recognition method based on Domain-Specific datasets and confidence decision networks,” Sensors, vol. 23, no. 13, p. 6036, June 2023.
[79]	C. Wang, A. Wu, and J. Pino, “Covost 2 and massively multilingual speech-to-text translation,” 2020. https://doi.org/10.48550/arxiv.2007.10310
[80]	H. K. Yadav and S. Sitaram, “A survey of multilingual models for automatic speech recognition,” 2022. https://doi.org/10.48550/arxiv.2202.12576
[81]	G. E. Dahl, D. Yu, L. Deng, and A. Acero, “Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 1, pp. 30-42, 2012. https://doi.org/10.1109/tasl.2011.2134090
[82]	K. R. Mabokela, M. Primus, and T. &3199;elik, “Advancing sentiment analysis for low-resourced african languages using pre-trained language models,” PLOS One, vol. 20, no. 6, pp. e0325102, 2025. https://doi.org/10.1371/journal.pone.0325102
[83]	K. R. Mabokela, T. &3199;elik, and M. Raborife, “Multilingual sentiment analysis for under-resourced languages: a systematic review of the landscape,” IEEE Access, vol. 11, pp. 15996-16020, 2023. https://doi.org/10.1109/access.2022.3224136
[84]	A. E. Mahdaouy, H. Alami, S. Lamsiyah, and I. Berrada, “Um6p at semeval-2023 task 12: out-of-distribution generalization method for african languages sentiment analysis,” in Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), 2023. https://doi.org/10.18653/v1/2023.semeval-1.138
[85]	B. V. P. Kumar and M. Sadanandam, “A fusion architecture of bert and roberta for enhanced performance of sentiment analysis of social media platforms,” International Journal of Computing and Digital Systems, vol. 15, no. 1, pp. 51-66, 2024. https://doi.org/10.12785/ijcds/150105
[86]	S. M. Yimam, H. M. Alemayehu, A. A. Ayele, and C. Biemann, “Exploring amharic sentiment analysis from social media texts: building annotation tools and classification models,” in Proceedings of the 28th International Conference on Computational Linguistics, 2020. https://doi.org/10.18653/v1/2020.coling-main.91

Cite This Article

Plain Text BibTeX RIS

APA Style

Obote, K., Kikwai, B., Senagi, K., Njiiri, J., Olukuru, J., et al. (2025). Bridging Swahili Communication Gaps: Real-Time Audio-to-Text Sentiment Analysis via Pre-trained NLP. American Journal of Artificial Intelligence, 9(2), 167-185. https://doi.org/10.11648/j.ajai.20250902.18

Copy | Download

ACS Style

Obote, K.; Kikwai, B.; Senagi, K.; Njiiri, J.; Olukuru, J., et al. Bridging Swahili Communication Gaps: Real-Time Audio-to-Text Sentiment Analysis via Pre-trained NLP. Am. J. Artif. Intell. 2025, 9(2), 167-185. doi: 10.11648/j.ajai.20250902.18

Copy | Download

AMA Style

Obote K, Kikwai B, Senagi K, Njiiri J, Olukuru J, et al. Bridging Swahili Communication Gaps: Real-Time Audio-to-Text Sentiment Analysis via Pre-trained NLP. Am J Artif Intell. 2025;9(2):167-185. doi: 10.11648/j.ajai.20250902.18

Copy | Download

@article{10.11648/j.ajai.20250902.18,
  author = {Kevin Obote and Benjamin Kikwai and Kennedy Senagi and Joyce Njiiri and John Olukuru and Joseph Sevilla},
  title = {Bridging Swahili Communication Gaps: Real-Time Audio-to-Text Sentiment Analysis via Pre-trained NLP
},
  journal = {American Journal of Artificial Intelligence},
  volume = {9},
  number = {2},
  pages = {167-185},
  doi = {10.11648/j.ajai.20250902.18},
  url = {https://doi.org/10.11648/j.ajai.20250902.18},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajai.20250902.18},
  abstract = {The global proliferation of digital communication highlights a critical gap in language technologies for digitally under-represented languages, particularly Kiswahili, a language spoken by over 100 million people. While significant advancements have been made in natural language processing (NLP) for high-resource languages like English, a persistent challenge remains in creating robust computational systems for low-resource linguistic contexts. This study addresses this challenge by presenting a novel, end-to-end Kiswahili audio processing pipeline that unifies three core capabilities; real-time speech recognition, sentiment analysis, and text summarization. The system’s novelty lies in its strategic leverage of state-of-the-art, pre-trained machine learning models, including Wav2vec2, DistilBERT, and T5, demonstrating a viable approach to bridging the digital communication gap for Kiswahili in real-world applications. Our methodology involved a rigorous evaluation of the integrated system using the Mozilla Common Voice Corpus. The results revealed key insights and promising performance metrics. The speech recognition component, a foundational element of the pipeline, achieved an exceptionally low Word Error Rate (WER) of 0.3329 with the Wav2vec2 model, highlighting its capacity for accurate transcription in a low-resource setting. This is a significant finding, as it suggests that models specifically fine-tuned for such environments can overcome the challenges of data scarcity and linguistic diversity. The summarization component also demonstrated strong capabilities, yielding a ROUGE-L score of 0.6622, which indicates robust semantic and structural alignment with reference texts. While the sentiment analysis revealed a notable data imbalance with a predominance of negative samples, the model achieved a 60% accuracy, demonstrating its potential for further refinement. These findings underscore both the immense potential and the inherent limitations of applying pre-trained models to a low-resource language like Kiswahili. They provide a compelling proof of concept for the technical feasibility of Kiswahili audio processing and emphasize the critical need for continued investment in dataset expansion and model optimization. The study concludes that this work establishes a foundational groundwork for continued research and the subsequent development of advanced NLP tools specifically tailored for Kiswahili-speaking populations, ultimately aiming to improve access to education, healthcare, and information services, and to foster greater digital inclusion throughout East Africa.
},
 year = {2025}
}

Copy | Download

TY  - JOUR
T1  - Bridging Swahili Communication Gaps: Real-Time Audio-to-Text Sentiment Analysis via Pre-trained NLP

AU  - Kevin Obote
AU  - Benjamin Kikwai
AU  - Kennedy Senagi
AU  - Joyce Njiiri
AU  - John Olukuru
AU  - Joseph Sevilla
Y1  - 2025/09/25
PY  - 2025
N1  - https://doi.org/10.11648/j.ajai.20250902.18
DO  - 10.11648/j.ajai.20250902.18
T2  - American Journal of Artificial Intelligence
JF  - American Journal of Artificial Intelligence
JO  - American Journal of Artificial Intelligence
SP  - 167
EP  - 185
PB  - Science Publishing Group
SN  - 2639-9733
UR  - https://doi.org/10.11648/j.ajai.20250902.18
AB  - The global proliferation of digital communication highlights a critical gap in language technologies for digitally under-represented languages, particularly Kiswahili, a language spoken by over 100 million people. While significant advancements have been made in natural language processing (NLP) for high-resource languages like English, a persistent challenge remains in creating robust computational systems for low-resource linguistic contexts. This study addresses this challenge by presenting a novel, end-to-end Kiswahili audio processing pipeline that unifies three core capabilities; real-time speech recognition, sentiment analysis, and text summarization. The system’s novelty lies in its strategic leverage of state-of-the-art, pre-trained machine learning models, including Wav2vec2, DistilBERT, and T5, demonstrating a viable approach to bridging the digital communication gap for Kiswahili in real-world applications. Our methodology involved a rigorous evaluation of the integrated system using the Mozilla Common Voice Corpus. The results revealed key insights and promising performance metrics. The speech recognition component, a foundational element of the pipeline, achieved an exceptionally low Word Error Rate (WER) of 0.3329 with the Wav2vec2 model, highlighting its capacity for accurate transcription in a low-resource setting. This is a significant finding, as it suggests that models specifically fine-tuned for such environments can overcome the challenges of data scarcity and linguistic diversity. The summarization component also demonstrated strong capabilities, yielding a ROUGE-L score of 0.6622, which indicates robust semantic and structural alignment with reference texts. While the sentiment analysis revealed a notable data imbalance with a predominance of negative samples, the model achieved a 60% accuracy, demonstrating its potential for further refinement. These findings underscore both the immense potential and the inherent limitations of applying pre-trained models to a low-resource language like Kiswahili. They provide a compelling proof of concept for the technical feasibility of Kiswahili audio processing and emphasize the critical need for continued investment in dataset expansion and model optimization. The study concludes that this work establishes a foundational groundwork for continued research and the subsequent development of advanced NLP tools specifically tailored for Kiswahili-speaking populations, ultimately aiming to improve access to education, healthcare, and information services, and to foster greater digital inclusion throughout East Africa.

VL  - 9
IS  - 2
ER  -

Copy | Download

Author Information

Kevin Obote

Strathmore Institute of Mathematical Science, Strathmore University, Nairobi, Kenya

Contact Email

http://orcid.org/0009-0000-7099-2154
Benjamin Kikwai

Department of Mathematics and Statistics, Machakos University, Machakos, Kenya

Contact Email

http://orcid.org/0000-0002-6741-5011
Kennedy Senagi

Strathmore Institute of Mathematical Science, Strathmore University, Nairobi, Kenya

Contact Email

http://orcid.org/0000-0002-0757-3907
Joyce Njiiri

Department of Languages and Lingustics, Machakos University, Machakos, Kenya

Contact Email

http://orcid.org/0009-0001-9124-0063
John Olukuru

Strathmore Institute of Mathematical Science, Strathmore University, Nairobi, Kenya

Contact Email

http://orcid.org/0000-0002-8534-2346
Joseph Sevilla

Strathmore Institute of Mathematical Science, Strathmore University, Nairobi, Kenya

Contact Email

http://orcid.org/0000-0002-9112-824X

Download PDF

Submit an Article

Sections

Plain Text BibTeX RIS

APA Style

Obote, K., Kikwai, B., Senagi, K., Njiiri, J., Olukuru, J., et al. (2025). Bridging Swahili Communication Gaps: Real-Time Audio-to-Text Sentiment Analysis via Pre-trained NLP. American Journal of Artificial Intelligence, 9(2), 167-185. https://doi.org/10.11648/j.ajai.20250902.18

Copy | Download

ACS Style

Obote, K.; Kikwai, B.; Senagi, K.; Njiiri, J.; Olukuru, J., et al. Bridging Swahili Communication Gaps: Real-Time Audio-to-Text Sentiment Analysis via Pre-trained NLP. Am. J. Artif. Intell. 2025, 9(2), 167-185. doi: 10.11648/j.ajai.20250902.18

Copy | Download

AMA Style

Obote K, Kikwai B, Senagi K, Njiiri J, Olukuru J, et al. Bridging Swahili Communication Gaps: Real-Time Audio-to-Text Sentiment Analysis via Pre-trained NLP. Am J Artif Intell. 2025;9(2):167-185. doi: 10.11648/j.ajai.20250902.18

Copy | Download

@article{10.11648/j.ajai.20250902.18,
  author = {Kevin Obote and Benjamin Kikwai and Kennedy Senagi and Joyce Njiiri and John Olukuru and Joseph Sevilla},
  title = {Bridging Swahili Communication Gaps: Real-Time Audio-to-Text Sentiment Analysis via Pre-trained NLP
},
  journal = {American Journal of Artificial Intelligence},
  volume = {9},
  number = {2},
  pages = {167-185},
  doi = {10.11648/j.ajai.20250902.18},
  url = {https://doi.org/10.11648/j.ajai.20250902.18},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajai.20250902.18},
  abstract = {The global proliferation of digital communication highlights a critical gap in language technologies for digitally under-represented languages, particularly Kiswahili, a language spoken by over 100 million people. While significant advancements have been made in natural language processing (NLP) for high-resource languages like English, a persistent challenge remains in creating robust computational systems for low-resource linguistic contexts. This study addresses this challenge by presenting a novel, end-to-end Kiswahili audio processing pipeline that unifies three core capabilities; real-time speech recognition, sentiment analysis, and text summarization. The system’s novelty lies in its strategic leverage of state-of-the-art, pre-trained machine learning models, including Wav2vec2, DistilBERT, and T5, demonstrating a viable approach to bridging the digital communication gap for Kiswahili in real-world applications. Our methodology involved a rigorous evaluation of the integrated system using the Mozilla Common Voice Corpus. The results revealed key insights and promising performance metrics. The speech recognition component, a foundational element of the pipeline, achieved an exceptionally low Word Error Rate (WER) of 0.3329 with the Wav2vec2 model, highlighting its capacity for accurate transcription in a low-resource setting. This is a significant finding, as it suggests that models specifically fine-tuned for such environments can overcome the challenges of data scarcity and linguistic diversity. The summarization component also demonstrated strong capabilities, yielding a ROUGE-L score of 0.6622, which indicates robust semantic and structural alignment with reference texts. While the sentiment analysis revealed a notable data imbalance with a predominance of negative samples, the model achieved a 60% accuracy, demonstrating its potential for further refinement. These findings underscore both the immense potential and the inherent limitations of applying pre-trained models to a low-resource language like Kiswahili. They provide a compelling proof of concept for the technical feasibility of Kiswahili audio processing and emphasize the critical need for continued investment in dataset expansion and model optimization. The study concludes that this work establishes a foundational groundwork for continued research and the subsequent development of advanced NLP tools specifically tailored for Kiswahili-speaking populations, ultimately aiming to improve access to education, healthcare, and information services, and to foster greater digital inclusion throughout East Africa.
},
 year = {2025}
}

Copy | Download

TY  - JOUR
T1  - Bridging Swahili Communication Gaps: Real-Time Audio-to-Text Sentiment Analysis via Pre-trained NLP

AU  - Kevin Obote
AU  - Benjamin Kikwai
AU  - Kennedy Senagi
AU  - Joyce Njiiri
AU  - John Olukuru
AU  - Joseph Sevilla
Y1  - 2025/09/25
PY  - 2025
N1  - https://doi.org/10.11648/j.ajai.20250902.18
DO  - 10.11648/j.ajai.20250902.18
T2  - American Journal of Artificial Intelligence
JF  - American Journal of Artificial Intelligence
JO  - American Journal of Artificial Intelligence
SP  - 167
EP  - 185
PB  - Science Publishing Group
SN  - 2639-9733
UR  - https://doi.org/10.11648/j.ajai.20250902.18
AB  - The global proliferation of digital communication highlights a critical gap in language technologies for digitally under-represented languages, particularly Kiswahili, a language spoken by over 100 million people. While significant advancements have been made in natural language processing (NLP) for high-resource languages like English, a persistent challenge remains in creating robust computational systems for low-resource linguistic contexts. This study addresses this challenge by presenting a novel, end-to-end Kiswahili audio processing pipeline that unifies three core capabilities; real-time speech recognition, sentiment analysis, and text summarization. The system’s novelty lies in its strategic leverage of state-of-the-art, pre-trained machine learning models, including Wav2vec2, DistilBERT, and T5, demonstrating a viable approach to bridging the digital communication gap for Kiswahili in real-world applications. Our methodology involved a rigorous evaluation of the integrated system using the Mozilla Common Voice Corpus. The results revealed key insights and promising performance metrics. The speech recognition component, a foundational element of the pipeline, achieved an exceptionally low Word Error Rate (WER) of 0.3329 with the Wav2vec2 model, highlighting its capacity for accurate transcription in a low-resource setting. This is a significant finding, as it suggests that models specifically fine-tuned for such environments can overcome the challenges of data scarcity and linguistic diversity. The summarization component also demonstrated strong capabilities, yielding a ROUGE-L score of 0.6622, which indicates robust semantic and structural alignment with reference texts. While the sentiment analysis revealed a notable data imbalance with a predominance of negative samples, the model achieved a 60% accuracy, demonstrating its potential for further refinement. These findings underscore both the immense potential and the inherent limitations of applying pre-trained models to a low-resource language like Kiswahili. They provide a compelling proof of concept for the technical feasibility of Kiswahili audio processing and emphasize the critical need for continued investment in dataset expansion and model optimization. The study concludes that this work establishes a foundational groundwork for continued research and the subsequent development of advanced NLP tools specifically tailored for Kiswahili-speaking populations, ultimately aiming to improve access to education, healthcare, and information services, and to foster greater digital inclusion throughout East Africa.

VL  - 9
IS  - 2
ER  -

Copy | Download