A Hybridized Feature Extraction Model for Offline Yorùbá Document Recognition
Asian Journal of Research in Computer Science, Volume 15, Issue 4,
Page 42-59
DOI:
10.9734/ajrcos/2023/v15i4329
Abstract
Document recognition is required to convert handwritten and text documents into digital equivalents, making them more easily accessible and convenient to store. This study combined feature extraction techniques for recognizing Yorùbá documents in an effort to preserve the cultural values and heritages of the Yorùbá people. Ten Yorùbá documents were acquired from Kwara State University’s Library, and ten indigenous literate writers wrote the handwritten version of the documents. These were digitized using HP Scanjet300 and pre-processed. The pre-processed image served as input to the Local Binary Pattern, Speeded-Up-Robust-Features and Histogram of Gradient. The combined extracted feature vectors were input into the Genetic Algorithm. The reduced feature vector was fed into Support Vector Machine. A 10-folds cross-validation was used to train the model: LBP-GA, SURF-GA, HOG-GA, LBP-SURF-GA, HOG-SURF-GA, LBP-HOG-GA and LBP-HOG-SURF-GA. LBP-HOG-SURF-GA for Yorùbá printed text gave 90.0% precision, 90.3% accuracy and 15.5% FPR. LBP-HOG-SURF-GA for Handwritten Yorùbá document showed 80.9% precision, 82.6% accuracy and 20.4% (FPR) LBP-HOG-SURF-GA for CEDAR gave 98.0% precision, 98.4% accuracy and 2.6% FPR. LBP-HOG-SURF-GA for MNIST gave 99% precision, 99.5% accuracy, 99.0% and 1.1% FPR. The results of the hybridized feature extractions (LBP-HOG-SURF) demonstrated that the proposed work improves significantly on the various classification metrics.
- Yorùbá
- feature extraction
- document recognition
- feature selection
- language processing
- machine learning
How to Cite
References
Manjusha K, Kumar MA, Soman KP. On developing handwritten character image database for Malayalam language script. Engineering Science and Technology, an International Journal. 2019;22(2):637-645.
Ahlawat S, Choudhary A. Hybrid CNN-SVM classifier for handwritten digit recognition. Procedia Computer Science. 2020;167:2554-2560.
Altwaijry N, Al-Turaiki I. Arabic handwriting recognition system using convolutional neural network. Neural Computing and Applications. 2021;33(7):2249-2261.
Ahmed R, Gogate M, Tahir A, Dashtipour K, Al-Tamimi B, Hawalah A, Hussain A. Novel deep convolutional neural network-based contextual recognition of Arabic handwritten scripts. Entropy. 2021;23(3):340.
Ajao JF, Yusuff SR, Ajao AO. Yorùbá character recognition system using convolutional recurrent neural network. Black Sea Journal of Engineering and Science. 2022;5(4):151-157.
Ajao JF, Olabiyisi SO, Elijah OO, Okediran OO, Odejobi OO. Database corpus for yorùbá handwriting. International Journal of Applied Pattern Recognition. 5(4):270-279, Published by Inderscience Enterprises Ltd.; 2018.
DOI:https://doi/abs/10.1504/IJAPR.2018.097102
Oladele MO, Adepoju TM, Olatoke OA, Adewale Ojo O. Offline Yorùbá handwritten word recognition using geometric feature extraction and support vector machine classifier. Malaysian Journal of Computing (MJoC). 2020;5(2):504-514.
Bay H, Ess A, Tuytelaars T, Van Gool L. Speeded-up robust features (SURF). Computer vision and Image Understanding. 2008;110(3):346-359.
Prashanth DS, Mehta RVK, Sharma N. Classification of handwritten Devanagari number–an analysis of pattern recognition tool using neural network and CNN. Procedia Computer Science. 2020;167:2445-2457.
Shaalan K, Siddiqui S, Alkhatib M, Abdel Monem A. Challenges in Arabic natural language processing. In Computational Linguistics, Speech and Image Processing for Arabic Language. 2019;59-83.
Yassin R, Share DL, Shalhoub-Awwad Y. Learning to spell in Arabic: The impact of script-specific visual-orthographic features. Frontiers in Psychology. 2020;11:2059.
Oladele MO, Adepoju TM, Omidiora EO, Sobowale AA, Olatoke OA, Ayeleso EC. An offline yorùbá handwritten character recognition using support vector machine. In International Conference of Science, Engineering and Environmental Technology. 2017;2(13):95-103.
Sabri N, Hamed HNA, Ibrahim Z, Ibrahim K, Isa MA, Diah NM. The hybrid feature extraction method for classification of adolescence idiopathic scoliosis using Evolving Spiking Neural Network. Journal of King Saud University-Computer and Information Sciences; 2022.
Khalil M, Ayad H, Adib A. Performance evaluation of feature extraction techniques in MR-Brain image classification system. Procedia Computer Science. 2018;127:218-225.
Akintola A, Ibiyemi T, Bajeh A. Evaluation of an optical character recognition model for Yoruba text. Annals. Computer Science Series. 2019;17(1).
Oni OJ, Asahiah FO. Computational modelling of an optical character recognition system for Yorùbá printed text images. Scientific African. 2020;9:e00415.
Ajao J, Mabayoje MA, Olorunmaiye E, Yussuf SR, Bajeh A. Offline yoruba word recognition system based on capsule neural network. Ilorin Journal of Computer Science and Information Technology. 2022;5(1):12-21.
Baldominos A, Saez Y, Isasi P. Hybridizing evolutionary computation and deep neural networks: an approach to handwriting recognition using committees and transfer learning. Complexity. 2019;2019.
Malakar S, Ghosh M, Bhowmik S, Sarkar R, Nasipuri M. A GA based hierarchical feature selection approach for handwritten word recognition. Neural Computing and Applications. 2020;32: 2533-2552.
Bansal M, Kumar M, Kumar M. 2D object recognition: a comparative analysis of SIFT, SURF and ORB feature descriptors. Multimedia Tools and Applications. 2021;80:18839-18857.
Pietikäinen M. Local binary patterns. Scholarpedia. 2010;5(3):9775.
Sen D, Pal SK. Gradient histogram: Thresholding in a region of interest for edge detection. Image and Vision Computing. 2010;28(4):677-695.
Bay H, Ess A, Tuytelaars T, Van Gool L. Speeded-up robust features (SURF). Computer Vision and Image Understanding. 2008;110(3):346-359.
Babatunde OH, Armstrong L, Leng J, Diepeveen D. A genetic algorithm-based feature selection; 2014.
Pisner DA, Schnyer DM. Support vector machine. In Machine learning. Academic Press. 2020;101-121.
Chitlangia A, Malathi G. Handwriting analysis based on histogram of oriented gradient for predicting personality traits using SVM. Procedia Computer Science. 2019;165:384-390.
Elias SJ, Hatim SM, Hassan NA, Abd Latif LM, Ahmad RB, Darus MY, Shahuddin AZ. Face recognition attendance system using Local Binary Pattern (LBP). Bulletin of Electrical Engineering and Informatics. 2019;8(1):239-245.
Dalal N, Triggs B. Histograms of oriented gradients for human detection. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05). 2005, June;1:886-893. Ieee.
Manikonda SK, Gaonkar DN. Islanding detection method based on image classification technique using histogram of oriented gradient features. IET Generation, Transmission & Distribution. 2020;14(14): 2790-2799.
Chen LC, Hsieh JW, Yan Y, Chen DY. Vehicle make and model recognition using sparse representation and symmetrical SURFs. Pattern Recognition. 2015;48(6): 1979-1998.
Wang R, Shi Y, Cao W. GA-SURF: A new speeded-up robust feature extraction algorithm for multispectral images based on geometric algebra. Pattern Recognition Letters. 2019;127:11-17.
Oyeniran OO, Oyeniyi JO, Omotosho LO, Ogundoyin IK. Development of an improved database for yoruba handwritten character. Journal of Engineering Studies and Research. 2021;27(4):84-89.
Oyeniran O, Oyebode E. Transfer learning based offline yorùbá handwritten character recognition system. Journal of Engineering Studies and Research. 2021;27(2):89-95.
DOI;https://doi.org/10.29081/jesr.v27i2.278
Scholkopf B, Smola AJ. Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press; 2018.
-
Abstract View: 18 times
PDF Download: 23 times