logo EDITE Sujets de doctorat

Advanced Malware Analysis

Sujet proposé par
Directeur de thèse:
Doctorant: Emanuele COZZI
Unité de recherche UMR 7102 Laboratoire de recherche d'EURECOM

Domaine: Sciences et technologies de l'information et de la communication


Objective Security companies collect million of malware samples every day. This big-data aspect is a new concept in malware analysis but it is certainly here to stay. On top of traditional samples, the upcoming Internet of Things (IoT) revolution will inevitably increase both the amount and the diversity of the collected artifacts. However, despite its promises, big data collection has so far brought to our field more challenges than advantages – mainly resulting in a burden for researchers and malware analysts. In fact, on the one hand more samples mean less time to analyze them and larger infrastructures required to store the files and execute them in dynamic analysis sandboxes. On the other hand, security companies are clearly struggling to sift through this increasing amount of data in the attempt to extract some actionable intelligence to better protect their customers and improve their services. As a result, while there is a clear global trend towards collecting more and more data, most of this data is just sitting unused on some server, taking terabytes of storage space without actually being used, exploited, and often even properly understood by the company that collect it. On top of this poor understanding of big malware dataset, new advanced techniques are making the analysis of individual samples more complex and more time-consuming. For instance, ROP-only malware, disk-less samples, and advanced obfuscations are reducing our ability to automatically process and understand new malicious files. The goal of this thesis is to harness the information stored in large malware datasets to improve the samples analysis, provide intelligence information, detect correlation, or simply study trends and evolution of different techniques used by malware writers. In this challenging context, this dissertation will also explore new techniques to extend current static and dynamic analysis approaches to the analysis of novel and sophisticated malware samples. This can involve heavily obfuscated and packed binaries or new form of malicious code and will rely on existing large-scale malware collection systems to provide the required data to conduct experiments.


Research Overview The first objective of this thesis is to advance the state of the art in binary and malware analysis. For example, recent efforts have been done to better understand packed samples [7], better analyze their behavior [3,8], or to reverse new form of advanced rootkits [5]. However, these works only scratched the surface of the techniques we need to analyze complex malware samples – both in a fully automated fashion and as tools to support manual reverse engineering. 1 A second objective of this thesis is the investigation of new form of malware, starting from malware running on other operating systems of platform. Only recently researchers have started looking at more “exotic” form or malware [4], but there is still a lot to explore in this area. For example, as a starting point we plan to develop an open source infrastructure to analyze Linuxbased malware samples. Internet routers and IoT devices are rapidly becoming prime targets for malicious code – ranging from simple botnet to more sophisticated targeted attacks. Unfortunately, the security industry is still largely unprepared for this threat. Most of the tool and the knowledge about the behavior and the characteristics of malware derives from a decade of research on Windows binaries. However, Linux samples have its unique set of characteristics, including the widespread use of static linking, the broad set of CPU architectures, its own packing ecosystem, and completely different techniques to achieve persistence and process infection. This task includes the development of dedicated tools, as well as their application to tens of thousands Linux malware samples – with the goal of extracting and measuring the prevalence of different techniques and the characteristics of this rapidly increasing form of malware. As a result, this part of the project would not only produce a usable platform, but also a precious knowledge base about the behavior and key indicators of Linux malware – that can be extremely useful for malware analysts, to improve the detection of these samples, and to guide incident response on infected devices. Finally, part of the research in this area will also focus on the problem of cyber-attribution [1, 2] – proposing new techniques to identify reused components and detect malware samples likely developed by the same group. As currently pointed out by Graziano et al. [6], the current malware collection infrastructure is very efficient, but the vertiginous amount of samples analyzed every day in dynamic analysis sandboxes makes it impossible to tell apart the interesting malware from the surrounding noise of less relevant samples.

References [1] S. Alrabaee, N. Saleem, S. Preda, L. Wang, and M. Debbabi. Oba2: An onion approach to binary code authorship attribution. Digital Investigation, 11:S94–S103, 2014. [2] S. Alrabaee, P. Shirani, M. Debbabi, and L.Wang. On the feasibility of malware authorship attribution. In International Symposium on Foundations and Practice of Security, pages 256–272. Springer, 2016. [3] G. Bonfante, J. Fernandez, J.-Y. Marion, B. Rouxel, F. Sabatier, and A. Thierry. CoDisasm: Medium Scale Concatic Disassembly of Self-Modifying Binaries with Overlapping Instructions. In 22nd ACM Conference on Computer and Communications Security, Denver, United States, Oct. 2015. [4] M. F. Botacin, P. L. de Geus, and A. R. A. Gr´egio. The other guys: automated analysis of marginalized malware. Journal of Computer Virology and Hacking Techniques, pages 1–12, 2017. [5] M. Graziano, D. Balzarotti, and A. Zidouemba. ROPMEMU: A Framework for the Analysis of Complex Code-Reuse Attacks. In Proceedings of the ACM Symposium on Information, Computer and Communications Security (ASIACCS), ASIACCS 16, June 2016. 2 [6] M. Graziano, D. Canali, L. Bilge, A. Lanzi, and D. Balzarotti. Needles in a Haystack: Mining Information from Public Dynamic Analysis Sandboxes for Malware Intelligence. In Proceedings of the 24rd USENIX Security Symposium (USENIX Security), August 2015. [7] X. Ugarte-Pedrero, D. Balzarotti, I. Santos, and P. G. Bringas. [SoK] Deep Packer Inspection: A Longitudinal Study of the Complexity of Run-Time Packers. In Proceedings of the IEEE Symposium on Security and Privacy. IEEE Computer Society, May 2015. [8] X. Ugarte-Pedrero, D. Balzarotti, I. Santos, and P. G. Bringas. RAMBO: Run-time packer Analysis with Multiple Branch Observation. July 2016.