Malware Analysis Part 1: How does it work?
Malware analysis is one of my favorite subjects. It’s as broad as it is deep because there is no end to the amount of bad stuff available on the internet. Today, let’s start with the basics and work our way up from there.
As you read this blog, please bear in mind that this is a beginner’s introduction to malware categorization, analysis, and possible motivations for doing so. Because it’s such a vast subject, there will definitely be some aspects that I miss due to not having enough space to discuss it all. Entire books have been written on this subject. My goal here is to give a glimpse into this discipline–and maybe pique your curiosity.
Malware analysis is broadly defined as the study or process of determining the functionality, origin, and potential impact of a given malware sample. In other words, you suspect that you may have found something malicious, but you need to know more about it, including:
- Is the sample actually malicious and/or an artifact of malicious activity?
- Who delivered this malware?
- What is the end-goal of this malware?
- How is the malware delivered?
- What are the characteristics of the malware itself?
- What actions can the malware perform?
- What are artifacts that the malware writes to the filesystem?
- What IP addresses and domains does it contact for command and control (C2)?
- Have any other hosts in the network encountered this malware (or it’s variants)?
- Have other organizations encountered this malware?
These are all questions that most organizations and SOCs will want answers to, that malware analysis can help provide.
Warning: As you may well know, malware is dangerous. Ask any organization that has been subjected to wormable malware, ransomware attacks, or malicious Microsoft Office documents. Most of the time when analyzing malware, it’s likely you won’t know its capabilities until your analysis is complete, and even then it’s still possible to miss something. Never analyze malware on a flat or unsegmented production network. Your malware analysis systems should never be able to connect to anything on your production network. Never practice malware analysis without some measure of network segmentation as well as network security measures in place. Be ready to rip out the network cable or power cord at a moment’s notice if you suspect something is going horribly wrong. In my opinion, malware analysis is fascinating and can be fun, but it’s also very easy to make mistakes if you don’t follow proper care and safety measures.
Types of Malware Analysis
In general, there are three types of malware analysis: triage, dynamic analysis, and static analysis. In most cases, analyzing malware is usually done in a virtual machine. For malware with anti-debugging and/or anti-sandboxing properties, though, a physical system is sometimes used instead. In most cases, the analysis system is either provided with no network connectivity or is allowed limited connectivity in order to acquire additional malware that the initial sample attempts to download.
Malware triage usually comes first and seeks to answer basic questions about a file sample:
- How was the sample obtained?
- What is the name of the sample?
- What is the file size and file extension?
- What are the file hashes for this file?
- Is the sample obfuscated? If so, how?
- If the malware is a binary (e.g. Linux ELF, shared object (SO), windows PE executable (exe), shared library (.dll), etc.), what functionality does it import? Is it cryptographically signed? Does it define a PDB path?
- Has this sample been seen before?
In most cases, getting answers to the the vast majority of these questions doesn’t require executing the malware itself, but it may use a variety of tools, scripts, or third party resources. To learn more about how to answer these questions while triaging a malware sample, check out this post and screencast tutorial.
Dynamic Analysis is best described as “if we attempt to run this file, what are the results of doing so?” What does the malware attempt to do? What files does it read? What does it write to the disk? What files does it attempt to modify? What IP addresses, domains, and/or URLs does it reach out to for additional payloads or instruction? What things does it check to see if it’s being run in an analysis environment (e.g., anti-debugging, anti-VM, etc.)?
There are two strategies to use when performing dynamic analysis: sandboxing and debugging.
Specially configured physical systems or virtual machines designed to run malware and observe its behavior while minimizing the risk and actual impact of the malware itself are often referred to as sandbox environments. Running a malicious payload to see what it does in a sandbox is sometimes referred to as “detonating the payload” or “sandboxing.”
Sometimes the sandbox environments are automated–that is, the malware is loaded into the physical system or VM, automatically run, all of the actions it takes are recorded and saved to a report, and then the system or virtual machine is reverted back to a state before the malware was run in order to run a new malicious payload. Other times, researchers will manually configure a virtual machine or physical system, manually detonate the malicious payload, and then observe the results via manual analysis and digital forensics.
Depending on the type of malware, manually triggering the payload may be required in order to work around anti-debugging, anti-sandbox, and/or anti-virtual machine protections built into the malicious sample. A manual sandbox environment could be as simple as a physical system connected to a network segment with a load of network monitoring, and limited network connectivity. It could be a virtual machine with a handful of malware analysis tools loaded and a snapshot configured to revert the VM back to its original state after analyzing a payload. Popular automated sandbox platforms include Cuckoo, CAPE, LiSa, detux, etc., while enterprise-tier sandboxes include ANY.RUN, Joe Sandbox, and Hybrid Analysis among others.
Here’s part of the report from an ANY.RUN sandbox run. for what looks like a macro-enabled Microsoft Word document. This part of the report shows the processes triggered from running the malicious sample.
In addition to sandboxing malicious files, another form of dynamic analysis involves the use of debuggers. A debugger is an application that is typically used to troubleshoot programs and/or determine the root cause of stability issues or other bugs–hence their name, debuggers.
There are a wide variety of system debuggers for a wide variety of operating systems. Using debugging tools for malware analysis often requires advanced systems level knowledge. This is because you need to be able to:
- observe the correct portions of program memory, understand the results of running the debugger,
- know when to modify execution and change CPU instructions associated with an action the malware would or would not take,
- know when to set breakpoints to stop execution, and
- know when to perform debugging one step (cpu instruction) at a time.
For an example of this in action, take a look at this reverse engineering tutorial that utilizes OllyDbg.
As mentioned above, there are a variety of debuggers for different platforms. For example, WinDbg and OllyDbg for Windows, LLDB for MacOS and Linux, and GDB for Linux among others.
This is a screen capture from an infosec institute tutorial on completing a reverse engineering exercise using OllyDbg on Windows. System debuggers are extremely complex and require a great deal of systems understanding a practice in order to know where and when to stop and/or modify program execution in order to learn more about a given malware sample.
Bear in mind that the debuggers I’ve referred to above are for operating systems and binary executables, and that different programming languages also feature debuggers and development environments that may be useful for malware analysis.
Static Analysis is essentially any analysis of a malicious file that can be performed without executing it. Technically, many triage tools and analysis techniques fall within this category. More often, however, static analysis refers to analyzing the assembly code or CPU instructions of a compiled program and attempting to interpret what actions the program will take.
Sometimes, static analysis suites will also feature decompilers or partner programs that will try to reverse the compiled executable back to a sort of pseudo code–what the decompiler thinks the source code may have looked like when it was written. This sometimes provides reverse engineers with further insights into functionality in the code that only activates under certain circumstances, or it might help reveal instances of dead code–code in a malicious program that doesn’t actually do anything but is placed in the program in order to waste time and/or frustrate the analyst as they attempt to uncover more details about the sample they are analyzing.
No Fine Lines between disassembly and debugger
A lot of static analysis suites really blur the lines between static and dynamic analysis, with some of them featuring debuggers or integrations with debuggers to run the program while it’s being analyzed. Oftentimes, these suites of tools are referred to as software reverse engineering suites. Examples of these tools include radare2/cutter, Ghidra, Hopper, Binary Ninja, and IDA (freeware or pro edition–which is wildly expensive).
While many security analysts and researchers have divided opinions on the NSA, most malware analysts agree that Ghidra sports a huge number of powerful features, including a decompiler for producing pseudocode from a binary being analyzed. Similar features in commercial software reverse engineering suites can be very pricey. Picture source: BleepingComputer
There are a number of reasons why you would want to analyze malware, including:
- To see what it does and what actions it performs on a host
- Determine what files it may attempt to drop or modify on a host
- Observe what it attempts to communicate with in order to download additional files, or accept orders (command and control)
This blog post covered a lot of the hows of malware analysis but not very many of the whys. Stay tuned for Part 2 and we’ll go over some of the ways malware analysis is used to help inform security posture and make enterprises more secure.
About Hurricane Labs
Hurricane Labs is a dynamic Managed Services Provider that unlocks the potential of Splunk and security for diverse enterprises across the United States. With a dedicated, Splunk-focused team and an emphasis on humanity and collaboration, we provide the skills, resources, and results to help make our customers’ lives easier.
For more information, visit www.hurricanelabs.com and follow us on Twitter @hurricanelabs.