Application Research on Semantic Analysis Using Latent Dirichlet Allocation and Collapsed Gibbs Sampling for Topic Discovery
Yetunde Esther Ogunwale
Department of Computer Science, University of Ilesa, Ilesa, Osun, Nigeria.
Micheal Olalekan Ajinaja *
Department of Computer Science, Federal Polytechnic Ile – Oluji, Ondo, Nigeria.
*Author to whom correspondence should be addressed.
Abstract
Topic discovery is a process of identifying the main topics present in a collection of documents. It is a crucial step in text mining, digital humanities, and information retrieval, as it allows one to extract meaningful information from large volumes of unstructured text data. The most widely used algorithm for topic discovery is Latent Dirichlet Allocation (LDA). LDA assumes that the words in each document are generated by a small number of underlying topics, and the algorithm learns the topics from the text data automatically. One of the main problems of LDA is that the topics extracted are of poor quality if the document does not coherently belong to a single topic. However, Gibbs sampling operates on a word-by-word basis, which allows it to be used on documents with a variety of topics and modifies the topic assignment of a single word. The paper presents application research on Latent Dirichlet Allocation and Collapsed Gibbs Sampling Semantic Analysis for topic discovery.
Keywords: Application, semantic similarity, topic modelling, LDA, collapsed Gibbs sampling