Machine Learning for Hate Text Speech Detection: A Comprehensive Review of Techniques, Dataset and Challenges

Usman Idris Ismail; Suleiman Salihu Jauro; Nuhu Abdulalim Muhammad; Saadatu Ali Jijji; Joshua C Shawulu; Abdullahi Adam Galadima

doi:10.9734/ajrcos/2026/v19i2832

Machine Learning for Hate Text Speech Detection: A Comprehensive Review of Techniques, Dataset and Challenges

Full Article - PDF Review History Discussion

Published: 2026-03-10

DOI: 10.9734/ajrcos/2026/v19i2832

Page: 204-218

Issue: 2026 - Volume 19 [Issue 2]

Usman Idris Ismail *

Department of Computer Science, Federal University Kashere, Nigeria.

Suleiman Salihu Jauro

Department of Computer Science, Gombe State University, Nigeria.

Nuhu Abdulalim Muhammad

Department of Computer Science, Federal University Kashere, Nigeria.

Saadatu Ali Jijji

Department of Computer Science, Federal University Kashere, Nigeria.

Joshua C Shawulu

Department of Computer Science, Federal University Kashere, Nigeria.

Abdullahi Adam Galadima

Department of Cyber Security, Federal University of Technology, Owerri, Nigeria.

*Author to whom correspondence should be addressed.

Abstract

Conventional moderation practices, which rely on human reviewers to identify and remove harmful contents are often labor-intensive, subjective, and unable to cope with the massive volume of user-generated data produced daily. Hate text speech has become a pervasive challenge across digital platforms, prompting extensive research into automated detection methods capable of identifying harmful and abusive content at scale. This review provides a comprehensive synthesis of machine learning approaches for hate speech detection, examining the linguistic characteristics of hateful expressions, the evolution of datasets, and the progression of modelling techniques from traditional machine learning to deep learning and transformer-based architectures. The analysis highlights the complexity of hate speech as a sociolinguistic phenomenon, particularly in its implicit, coded, and context dependent forms, which remain difficult for automated systems to detect reliably. Significant limitations in existing datasets including annotation inconsistency, class imbalance, domain specificity, and limited multilingual coverage further constrain model performance and generalization. Across the literature, challenges related to bias and inadequate evaluation practices persist. By synthesizing current trends and identifying gaps. This review outlines key research directions focused on contextual modelling, multilingual and cross-cultural resources, implicit hate detection, fairness aware algorithms, and adaptive learning strategies. The findings underscore the need for interdisciplinary with ethically grounded approaches to develop robust and socially responsible hate speech detection systems capable of supporting safer online environments. Overall, hate speech detection remains an evolving field that requires ongoing refinement of datasets, models, and evaluation practices. By addressing current gaps and embracing innovative approaches, future systems can better support the creation of safer and more inclusive digital environments.

Keywords: Hate speech, machine learning, dataset annotation, implicit hate speech, abusive language

How to Cite

Ismail, Usman Idris, Suleiman Salihu Jauro, Nuhu Abdulalim Muhammad, Saadatu Ali Jijji, Joshua C Shawulu, and Abdullahi Adam Galadima. 2026. “Machine Learning for Hate Text Speech Detection: A Comprehensive Review of Techniques, Dataset and Challenges”. Asian Journal of Research in Computer Science 19 (2):204-18. https://doi.org/10.9734/ajrcos/2026/v19i2832.

Downloads

Download data is not yet available.