From Entropy to Compression: Surveying Information-theoretic Signals for Early Malware Detection

Kabeya Tshiseba Cedric *

Pedagogic National University, DRC and Catholic University of Congo, DRC.

Dionga Ndibu Ornella

Pedagogic National University, DRC.

Lubongo Muembe Georgine

Pedagogic National University, DRC.

Gloire Alonda Madomba

Mbandaka University, DRC.

Simplice Eale Botuli

Mbandaka University, DRC.

Joel Mangoma Joel

Pedagogic National University, DRC.

Kevin Mongoy Bonyolo

Pedagogic National University, DRC.

*Author to whom correspondence should be addressed.


Abstract

Malware continues to evolve in ways that reduce the effectiveness of signature matching and evade analysis environments, creating demand for early detection methods that can flag suspicious binaries before behavior is observed. This article surveys information-theoretic approaches for early malware detection, focusing on the progression from classical uncertainty measures to algorithmic notions of complexity. We first discuss Shannon entropy as a lightweight indicator of packing, encryption, and obfuscation, and explain how entropy computed over whole files, executable sections, or sliding windows can localize anomalous regions in portable executable binaries. We then examine algorithmic complexity through the lens of Kolmogorov complexity and outline practical approximations using general-purpose compression. Compression-based measures provide a way to estimate structural regularities in binaries that are not captured by frequency statistics alone. Building on this idea, we introduce normalized compression distance as a featureless similarity measure that enables clustering and nearest-neighbor style detection without handcrafted features. The survey highlights how entropy, compressibility, and compression-based similarity can be combined into hybrid pipelines that support triage, prioritization, and explainable inspection, while also noting key limitations. High entropy is not unique to malware and can arise in legitimate packed installers, multimedia resources, or encrypted payloads, leading to false alarms if used in isolation. Compression-based methods can be computationally demanding and sensitive to file size, compressor choice, and adversarial manipulation. By synthesizing these techniques and their practical considerations, this article provides guidance for designing robust early-warning detectors and for integrating information-theoretic signals with complementary static and learning-based components in operational settings.

Keywords: Malware, obfuscation techniques, algorithmic complexity, Shannon entropy, Kolmogorov complexity


How to Cite

Cedric, Kabeya Tshiseba, Dionga Ndibu Ornella, Lubongo Muembe Georgine, Gloire Alonda Madomba, Simplice Eale Botuli, Joel Mangoma Joel, and Kevin Mongoy Bonyolo. 2026. “From Entropy to Compression: Surveying Information-Theoretic Signals for Early Malware Detection”. Asian Journal of Research in Computer Science 19 (3):24-36. https://doi.org/10.9734/ajrcos/2026/v19i3834.

Downloads

Download data is not yet available.