3ConFA: A Robust Feature Aggregation Framework for High-dimensional Data Optimization
Clive Asuai
*
Citizen Finance (CiFi) Network, Canada.
Akazue Maureen
Department of Computer Science, Delta State University, Abraka, Nigeria.
Abel Edje
Department of Computer Science, Delta State University, Abraka, Nigeria.
Mayor Andrew
Department of Statistics, Delta State Polytechnic, Otefe-Oghara, Nigeria.
Peace Oguoguo Ezzeh
Department of Computer Science, Federal College of Education (Technical), Asaba, Nigeria.
Houssem Hosni
University of La Rochelle, France.
Ibrahim Khan
Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, USA.
*Author to whom correspondence should be addressed.
Abstract
High-dimensional datasets pose significant challenges in machine learning, including overfitting, increased computational complexity, and reduced model interpretability. In response to these challenges, this paper introduces the Three Conditions for Feature Aggregation (3ConFA) framework, a feature selection approach based on ensembles that seeks to reduce dimensionality without reducing predictive performance. The 3ConFA framework integrates three key techniques; Chi-square (χ²) test, Information Gain (IG), and Decision Tree-based Recursive Feature Elimination (DT-RFE), and applies stringent conditions to select only the most relevant features. A feature is retained if and only if it satisfies all three conditions: (1) IG score ≥ mean IG threshold, (2) χ² score ≥ mean χ² threshold, and (3) DT-RFE importance score = 1.
Experimental evaluation on ten benchmark datasets demonstrates the effectiveness of the framework, achieving feature reductions of up to 98.75% (e.g., Dexter dataset) while improving classification accuracy (e.g., Madelon: 78% → 85%). Performance metrics (precision, recall, F1-score) were consistently enhanced after feature selection, confirming that 3ConFA enhances model generalization without tradeoff in critical information. The framework’s adaptability across diverse datasets highlights its potential for applications in fraud detection, healthcare, and image classification.
This study contributes a structured, condition-driven feature aggregation approach that outperforms traditional filter and wrapper methods. Future work may explore adaptive thresholding and integration with deep learning models.
Keywords: Feature selection, dimensionality reduction, ensemble learning, machine learning, 3ConFA framework, chi-square, information gain, DT-RFE