Application Research on Semantic Analysis Using Latent Dirichlet Allocation and Collapsed Gibbs Sampling for Topic Discovery

Yetunde Esther Ogunwale

Department of Computer Science, University of Ilesa, Ilesa, Osun, Nigeria.

Micheal Olalekan Ajinaja *

Department of Computer Science, Federal Polytechnic Ile – Oluji, Ondo, Nigeria.

*Author to whom correspondence should be addressed.


Abstract

Topic discovery is a process of identifying the main topics present in a collection of documents. It is a crucial step in text mining, digital humanities, and information retrieval, as it allows one to extract meaningful information from large volumes of unstructured text data. The most widely used algorithm for topic discovery is Latent Dirichlet Allocation (LDA). LDA assumes that the words in each document are generated by a small number of underlying topics, and the algorithm learns the topics from the text data automatically. One of the main problems of LDA is that the topics extracted are of poor quality if the document does not coherently belong to a single topic. However, Gibbs sampling operates on a word-by-word basis, which allows it to be used on documents with a variety of topics and modifies the topic assignment of a single word. The paper presents application research on Latent Dirichlet Allocation and Collapsed Gibbs Sampling Semantic Analysis for topic discovery.

Keywords: Application, semantic similarity, topic modelling, LDA, collapsed Gibbs sampling


How to Cite

Ogunwale , Yetunde Esther, and Micheal Olalekan Ajinaja. 2023. “Application Research on Semantic Analysis Using Latent Dirichlet Allocation and Collapsed Gibbs Sampling for Topic Discovery”. Asian Journal of Research in Computer Science 16 (4):445-52. https://doi.org/10.9734/ajrcos/2023/v16i4404.

Downloads

Download data is not yet available.