Automated Data Cleaning in Large Databases Using Machine Learning Methods
Hajar Maseeh Yasin *
IT Department, College of Informatics, Akre University for Applied Sciences, Iraq.
Aso Kareem Khorsheed
IT Department, College of Informatics, Akre University for Applied Sciences, Iraq.
*Author to whom correspondence should be addressed.
Abstract
The paper discusses the need for effective data cleaning processes to ensure the accuracy and reliability of datasets in machine learning and big data analytics due to the growing volume and complexity of data. Traditional manual cleaning methods are often inefficient and error-prone, compromising data quality. It explores automated techniques that utilize machine learning, particularly integrating supervised and unsupervised learning algorithms, to enhance data preparation efficiency. The study shows that these advanced methods can significantly improve data quality, reduce preparation time, and support better decision-making. Ultimately, it emphasizes the importance of robust data cleansing frameworks for effectively harnessing big data's potential and improving model performance in various applications.
Keywords: Data cleaning, machine learning, big data, data quality, automation, supervised learning, unsupervised learning, efficienc, decision-making data integration