Recent Advances in RL for Self-adaptive Software Systems: A Systematic Review
Joshua Jonah Vincent *
Department of Informatics, Faculty of Engineering and Architecture, Afro-American University of Central Africa, La Paz, Oyala, Equatoria Guinea.
*Author to whom correspondence should be addressed.
Abstract
In dynamic environments like cloud computing, the internet of things (IoT), and cyber-physical systems, where conventional rule-based adaptation mechanisms frequently fall short of maintaining optimal performance in the face of uncertainty and change, self-adaptive software systems (SASS) are becoming more and more important. A promising remedy that allows systems to learn and adapt on their own through trial-and-error interactions is Reinforcement Learning (RL). After a thorough screening of 1,248 papers, 68 quantitative studies were chosen for analysis in this systematic review, which examines developments in RL for dynamic optimisation of SASS from 2021 to early 2025. Value-based, policy gradient, multi-agent, and hybrid/meta-learning approaches are the main RL methodologies identified in the review, which also looks at how they are applied in fields like cybersecurity, cloud resource management, and autonomous systems. The findings indicate that Cloud systems reduced average cost by 34.7% using PPO-based solutions, cybersecurity systems improved attack detection speed by 22.1% and false positive rates by 18.3%, and autonomous systems reduced energy consumption by 40% and adaptation latency by 27.5% in IoT and swarm robotics. Policy gradient methods (41%) dominate continuous control tasks, with PPO used in 27% of studies. Value-based approaches (32%) dominate discrete action domains, with deep Q-networks (DQN) variants used in 78% of cloud resource allocation studies. Multi-agent RL accounts for 18% of studies, with Multi-agent deep deterministic policy gradient (MADDPG) - 62% and QMIX (38%) being the most used. Serverless computing cut cold-start times by 35%, data centre optimisation lowered power usage effectiveness (PUE) by 15%, and RL-driven intrusion detection systems identified zero-day threats with 92% accuracy. Reward design difficulties were found in 63% of experiments, sample inefficiency required 1.2M episodes to converge, and real-world multi-agent reinforcement learning (MARL) deployments performed 23% worse than models. Metal analysis effect resulted in 95% in cost reduction, latency improvement and adaptation speed respectively. Practical adoption is limited because so few studies use standardised benchmarks or address safety and interpretability. In order to close the gap between research and practical implementation, the article ends by outlining open research questions and promoting formal verification, transfer learning, and hybrid learning.
Keywords: Reinforcement learning, self-adaptive software systems, dynamic optimization, runtime adaptation, machine learning, deep reinforcement learning, multi-agent reinforcement learning, cloud computing, multi-agent systems