Malware analysis is one of my favorite subjects. It’s as broad as it is deep because there is no end to the amount of bad stuff available on the internet. Today, let’s start with the basics and work our way up from there.
As you read this blog, please bear in mind that this is a beginner’s introduction to malware categorization, analysis, and possible motivations for doing so. Because it’s such a vast subject, there will definitely be some aspects that I miss due to not having enough space to discuss it all. Entire books have been written on this subject. My goal here is to give a glimpse into this discipline–and maybe pique your curiosity.
Malware analysis is broadly defined as the study or process of determining the functionality, origin, and potential impact of a given malware sample. In other words, you suspect that you may have found something malicious, but you need to know more about it, including:
- Is the sample actually malicious and/or an artifact of malicious activity?
- Who delivered this malware?
- What is the end-goal of this malware?
- How is the malware delivered?
- What are the characteristics of the malware itself?
- What actions can the malware perform?
- What are artifacts that the malware writes to the filesystem?
- What IP addresses and domains does it contact for command and control (C2)?
- Have any other hosts in the network encountered this malware (or it’s variants)?
- Have other organizations encountered this malware?
These are all questions that most organizations and SOCs will want answers to, that malware analysis can help provide.
Warning: As you may well know, malware is dangerous. Ask any organization that has been subjected to wormable malware, ransomware attacks, or malicious Microsoft Office documents. Most of the time when analyzing malware, it’s likely you won’t know its capabilities until your analysis is complete, and even then it’s still possible to miss something. Never analyze malware on a flat or unsegmented production network. Your malware analysis systems should never be able to connect to anything on your production network. Never practice malware analysis without some measure of network segmentation as well as network security measures in place. Be ready to rip out the network cable or power cord at a moment’s notice if you suspect something is going horribly wrong. In my opinion, malware analysis is fascinating and can be fun, but it’s also very easy to make mistakes if you don’t follow proper care and safety measures.
Types of Malware Analysis
In general, there are three types of malware analysis: triage, dynamic analysis, and static analysis. In most cases, analyzing malware is usually done in a virtual machine. For malware with anti-debugging and/or anti-sandboxing properties, though, a physical system is sometimes used instead. In most cases, the analysis system is either provided with no network connectivity or is allowed limited connectivity in order to acquire additional malware that the initial sample attempts to download.
Malware triage usually comes first and seeks to answer basic questions about a file sample:
- How was the sample obtained?
- What is the name of the sample?
- What is the file size and file extension?
- What are the file hashes for this file?
- Is the sample obfuscated? If so, how?
- If the malware is a binary (e.g. Linux ELF, shared object (SO), windows PE executable (exe), shared library (.dll), etc.), what functionality does it import? Is it cryptographically signed? Does it define a PDB path?
- Has this sample been seen before?
In most cases, getting answers to the the vast majority of these questions doesn’t require executing the malware itself, but it may use a variety of tools, scripts, or third party resources. To learn more about how to answer these questions while triaging a malware sample, check out this post and screencast tutorial.
Dynamic Analysis is best described as “if we attempt to run this file, what are the results of doing so?” What does the malware attempt to do? What files does it read? What does it write to the disk? What files does it attempt to modify? What IP addresses, domains, and/or URLs does it reach out to for additional payloads or instruction? What things does it check to see if it’s being run in an analysis environment (e.g., anti-debugging, anti-VM, etc.)?
There are two strategies to use when performing dynamic analysis: sandboxing and debugging.
Specially configured physical systems or virtual machines designed to run malware and observe its behavior while minimizing the risk and actual impact of the malware itself are often referred to as sandbox environments. Running a malicious payload to see what it does in a sandbox is sometimes referred to as “detonating the payload” or “sandboxing.”
Sometimes the sandbox environments are automated–that is, the malware is loaded into the physical system or VM, automatically run, all of the actions it takes are recorded and saved to a report, and then the system or virtual machine is reverted back to a state before the malware was run in order to run a new malicious payload. Other times, researchers will manually configure a virtual machine or physical system, manually detonate the malicious payload, and then observe the results via manual analysis and digital forensics.
Depending on the type of malware, manually triggering the payload may be required in order to work around anti-debugging, anti-sandbox, and/or anti-virtual machine protections built into the malicious sample. A manual sandbox environment could be as simple as a physical system connected to a network segment with a load of network monitoring, and limited network connectivity. It could be a virtual machine with a handful of malware analysis tools loaded and a snapshot configured to revert the VM back to its original state after analyzing a payload. Popular automated sandbox platforms include Cuckoo, CAPE, LiSa, detux, etc., while enterprise-tier sandboxes include ANY.RUN, Joe Sandbox, and Hybrid Analysis among others.