Adaptive Hybrid Data Preprocessing for Homogeneous Healthcare Data Integration and Ontology Construction
Kranthi Kumar R
*
Department of CSE, Faculty of Computer Science and Engineering, JNT University Hyderabad, India.
B. Padmaja Rani
Department of CSE, Faculty of Computer Science and Engineering, JNTUH College of Engineering, JNT University Hyderabad, India.
*Author to whom correspondence should be addressed.
Abstract
Healthcare data, whether typically collected in a hospital environment (e.g., electronic health records, laboratory tests, etc.) or collated into an organized database for research and analytics, requires thorough preliminary examination in the form of data preprocessing to ensure trustworthy analysis and reliable semantic modeling. Inconsistent and heterogeneous data remain major obstacles in building effective ontologies, which are essential for semantic data integration. This paper presents an adaptive hybrid data preprocessing technique tailored for homogeneous data environments, aiming to enhance ontology construction. By integrating and customizing existing data cleaning methods, the approach dynamically addresses dataset-specific inconsistencies. AHPD is a modular pipeline that implements statistical, rule-based, and semantic-based methods and works to clean, normalize, and harmonize datasets typically structured, collected, or obtained from various component parts of a hospital. AHPD functions include dealing with missing data dynamically, maintaining awareness of inconsistencies, correcting inaccuracies, dealing with inter-dataset dependencies, and normalized schema alignment, resulting in data of reliable quality for analysis and semantic applications. From there, cleaned data files transformed into OWL-based ontologies can facilitate the inference and reasoning capabilities for intelligent querying. The performance of the ontology, enhanced by AHPD, was evaluated through the execution of SPARQL queries with high precision, recall, and F-measure, representing relevant clinical events and dependencies. The research concluded that AHPD improved data quality realized through analysis and compressed qualities of data, enabling practical construction of ontology and realistic potential of semantically informed smart applications to support integration of healthcare data and intelligent retrieval of health knowledge.
Keywords: Adaptive hybrid data preprocessing, homogeneous data integration, healthcare data cleaning, ontology construction, semantic data modeling, SPARQL query evaluation