Rethinking Data Architectures in the Face of Information Diversity and Exponential Growth
Maksim Romanchuk *
Westchester Medical Centre, USA.
*Author to whom correspondence should be addressed.
Abstract
Subject: The subject of this article is the analysis of the impact of exponential growth in data volume (up to petabytes and exabytes) and variety (Big Data) on data management architectures and methodologies.
Aims: The objective is to identify the challenges in processing and integrating large volumes of heterogeneous data and to conduct a comparative analysis of modern approaches.
Methodology: The methodology employs systematization, generalization, and comparative analysis of architectures (NoSQL, Data Lake, Hadoop, Spark, Flink) and methodologies (Agile, DevOps, Data Governance, Data Mesh, Data Fabric).
Results: This manuscript focuses on a pivotal topic in Big Data management, exploring the interplay between data growth, architectures, and methodologies. Results indicate that traditional relational DBMS (Database Management Systems) exhibit significant limitations in horizontal scalability and unstructured data processing, whereas NoSQL solutions (document, columnar, etc.) offer the schema flexibility and scalability required for Big Data. Distributed systems, such as Spark and Flink, provide orders of magnitude higher performance for analytical and streaming tasks compared to traditional approaches. The study underscores the critical interconnection between architecture selection (e.g., Data Lake for flexibility) and methodology adaptation (e.g., DataOps for speed, Data Governance for quality control) for effective data integration and management. The scope of application includes the design of data management systems and the selection of optimal technology combinations (e.g., ELT instead of ETL in Data Lakes) for analytics. Its systematic comparison of key technologies and frameworks addresses a gap in literature that often treats these elements separately. Real-world case studies enhance practical relevance, offering valuable guidance for practitioners. It contributes meaningfully to the scientific community by synthesizing selection criteria for effective Big Data systems. A conclusion is drawn regarding the necessity of an integrated approach that combines horizontally scalable architectures, modern processing tools, and flexible yet governed methodologies for successfully handling Big Data.
Keywords: Big Data, NoSQL, data management methodologies, data governance, data mesh, data fabric