Semantic Search for Data on a Given Topic in Social Networks: A Comparative Study of Keyword-based and BERT-based Methods
Yerassyl Ussen *
Astana IT University, Kazakhstan.
Zuleykha Anvarovna
Astana IT University, Kazakhstan.
*Author to whom correspondence should be addressed.
Abstract
Semantic search has emerged as a powerful alternative to traditional keyword-based retrieval, particularly in the context of unstructured social media data. This study presents a comparative analysis of a semantic search system based on Sentence-BERT (SBERT) and a conventional keyword-based pipeline implemented with Elasticsearch, using a large Reddit dataset as a case study. The primary contribution lies in integrating state-of-the-art semantic modeling with scalable search infrastructure and empirically evaluating its effectiveness on real-world social media content. The experimental workflow includes six stages: dataset selection, preprocessing, embedding generation, indexing, query processing, and performance evaluation. Results show that the SBERT-based semantic search system consistently outperforms the keyword-based approach across all metrics, particularly in capturing user intent, handling informal language, and retrieving semantically relevant content despite lexical variations. Nonetheless, the semantic approach incurs higher computational costs and exhibits occasional overgeneralization.
Keywords: System built using Sentence-BERT, semantic modelling, social media, conversational language