Malware Triage: Dissecting Threats to Your Security

By |Published On: October 28th, 2021|Tags: , |

Malware analysis is an incredibly broad topic. Because of the near-limitless number of operating systems, system architectures, scripting languages, and services out there, the potential for delivering malware and defining its behavior is nearly limitless as well. This means any conversation about analyzing and mitigating malware must also, necessarily, be wide-ranging. 

In most cases, however, we can breakdown malware analysis into three different disciplines:

  • triage
  • dynamic analysis
  • static analysis

In this blog post, we are going to focus on the purpose of malware triage, the questions you can answer during your analysis, as well as some of the tools and techniques you can use to learn more about the various types of malware. For more guidance, see my screencast below as I work through triaging a malware sample.

But first, a warning:

Always analyze malware on dedicated malware analysis systems and/or virtual machines. These systems should be segmented and separated away from the rest of your systems, with no way to connect to systems on your production networks. Any malicious artifacts that you are attempting to analyze should be treated with care at all times, even if you believe they aren’t particularly dangerous.

The purpose of triage

Malware triage attempts to answer basic questions about a file sample:

  • How was the sample obtained?
  • What is the name of the sample?
  • What is the file size and file extension?
  • What are the file hashes for this file?
  • Is the sample obfuscated? If so, how?
  • If the malware is a binary (e.g. Linux ELF, shared object (SO), windows PE executable (exe), shared library (.dll), etc.), what functionality does it import? Is it cryptographically signed? Does it define a PDB path?
  • Has this sample been seen before?

Most of the time, getting these answers won’t require executing the malware, but you may use a variety of tools, scripts, or third party resources. We’ll go ahead and walk through how to answer each of these questions.

How was the sample obtained?

Occasionally, you might get lucky and get information on how the sample was obtained. Maybe it was an executable dropped by a malicious word document phish with the subject line xxx from e-mail address yyy, or it was a payload from a remote execution campaign, or it was an implant dropped from a backdoor or webshell. 

If you don’t get lucky, however, you might need to investigate to figure out where the malware came from and how it got delivered. This is where various data sources such as proxy, dns, firewall, process auditing (e.g. sysmon/EID 4688), EDR, etc. come into play.

File Characteristics (file metadata)

Other information, such as the file name, size, and so on, are easily gathered using native Windows and/or Linux/Unix tools. On the Windows side, it’s usually as simple as right-clicking on the file, and checking out its properties. While on Unix/Linux operating systems, you can use the file and/or stat commands to find out more about a file.

The file properties menu in Windows can be used to discover a wide variety of metadata about a file. In some cases, if the file is a binary, it will tell users whether or not it’s cryptographically signed and what authority was responsible for signing the file. This is extremely useful for malware identification.

Meanwhile, Linux/Unix users get access to the file and stat commands. The file command reads the first couple of bytes of a file, known as the “file magic” and compares that to a file called “/etc/magic” to help identify the type of file, where as the stat command pulls raw filesystem information about a file–its size, ownership, timestamps, etc.

File Hashes

Obtaining the file hash of a file–known as hashing–involves using one of a variety of cryptographic one-way functions, such as MD5, SHA1, SHA256, SHA384, SHA512, etc. The resulting output is a string of letters and numbers that can be used to “fingerprint” and identify that file. Operating system tools such as the Windows PowerShell function Get-Filehash, or the Linux suite of file hashing tools such as md5sum, shasum, sha1sum, sha224sum, sha256sum, sha384sum, sha512sum, etc. can be used to hash files.

In the image above, I created a file named file.txt with the text “seems legit.” I then used a variety of hashing tools on my Linux machine to generate an MD5, SHA1, SHA256, SHA384, and SHA512 hash of the file. All of these hashing algorithms represent different ways to fingerprint the same file. Each hashing algorithm is supposed to generate a unique hash for each object it hashes. Occasionally, this is not the case and results in something called a hash collision. These days most professionals recommend using SHA256 or better for hashing files and data to avoid collisions.

On Windows, PowerShell has the get-filehash function that supports a variety of hashing algorithms. My understanding of looping in PowerShell isn’t quite as strong as it is with the bash shell on Linux, so here I just demonstrated running Get-FileHash multiple times, specifying a different hashing algorithm each time for “file.txt” containing the string “ayy.”

These are just some of the more well-known and well-established hashing algorithms. There are other hashing algorithms that are somewhat unique to malware analysis, such as ssdeep, peHash, and imphash–which are designed for special use cases.

File Obfuscation

Detecting file obfuscation is simultaneously pretty easy and pretty difficult. If the sample you’re analyzing is a scripting language of some sort (e.g. bash/shell script, javascript/jscript, perl, php, PowerShell, vbs, wmi, etc.), can the functionality of the script be easily ascertained, or does it look like gibberish? Is it a script that you would expect a user to run? 

A lot of threat hunters refer to analysis of this nature as “passing the sniff test,” referring to the habit of testing food that is close to its expiration date by smelling it to see if it smells spoiled. There is a ton of research out there with how to detect obfuscation on the Windows command line, with PowerShell, Javascript, and a ton of other scripting languages. In addition to write-ups, there are a wide variety of tools that can be used to triage files and determine whether or not they are obfuscated. For example, CyberChef is a malware analyst’s best friend. 

CyberChef

Originally a GCHQ project, CyberChef is a self-contained web page that users can download and run locally that hosts a variety of tools that can be used to encode and decode files. 

For example, if you receive a sample that appears to be encoded, and you know that its say, base 64 encoded, or URL encoded, gzipped, or otherwise obfuscated, CyberChef has “recipes” users can apply to their data to decode it. Do you know that a sample is obfuscated, but you don’t know how exactly it’s obfuscated? CyberChef can also try to auto-detect the obfuscation for you and decode it automatically. 

If you’d like to see a small demonstration on how CyberChef works, I demoed using it in a previous blog post–Threat Hunting with Splunk: Part 3. I used CyberChef to decode encoded (base64) PowerShell commands, decompress gzipped content, and remove null bytes from the output. I then fed the Shellcode to a debugger/emulator to see what it did. Pretty useful, eh?

YARA

In addition to CyberChef, consider scanning your sample with YARA. YARA is a multi-platform pattern matching engine designed to identify and classify malware based on patterns configured in rule files. If it helps, think of it as a Snort or Suricata, except for individual files. If a malware sample matches a pattern in a rule file, it triggers an alert. That means it’s effective for malicious script files, dropped artifacts, as well as malicious executables (Windows PE, Linux ELF, MacOS mach-O), etc.

Some might draw the parallel that YARA is an endpoint protection tool. After all, it matches patterns based on rules and those are basically AV signatures, right? But that’s really not the case because YARA doesn’t offer a real-time protection or quarantine component. All it does is manually scan and pattern match.

Strings

While I’ve mainly focused on obfuscation in plaintext script files, what about binary files? What can be done to detect anomalies, or obfuscation in binaries? For starters, there is a very simple utility available for most operating systems called strings. This command can scan a file and show you all of the ascii strings in a file. There’s a very good chance that if the strings command doesn’t detect a whole lot, then the executable might be packed, XOR encoded, or otherwise obfuscated. There are many tools out there that can be used to try to determine what sort of obfuscation is in use and decode it. For example, FireEye’s FLOSS tool and, again, CyberChef.

Determining Functionality (imported functions)

Figuring out what functionality malicious binaries try to import is critical to determining what their capabilities are as well. When we talk about malware that imports functionality, it refers to when malware makes use of Operating system shared libraries (e.g., Windows DLLS, Linux SO files, and/or macOS dylibs) in order to perform different tasks, for example, writing to the registry, spawning a process, writing files, interacting with other processes, making network connections, etc. This sort of goes hand-in-hand with dumping the strings for a binary because many times file strings will contain the name of a function or the filename of a library it wants to import. This information provides clues and gives us something to look up in order to determine what the binary wants to do with that function or shared library.

Has the file been seen in the wild?

Finally, we come to the question “has this file been seen before?” The most common way malware researchers attempt to answer that is by asking VirusTotal and/or other threat intelligence and ingestion sources. In case you don’t know, VirusTotal is a website that allows anyone to submit a file, URL, or IP address and see if that indicator the user wants to search for has been seen “in the wild” before by a variety of different endpoint detection engines, IP address/URL blocklists and/or web reputation services. When it comes to submitting files, VirusTotal allows users to upload any type of file, so long as it is under the file size limit. However, there are risks associated with submitting files to VirusTotal.

Virustotal, Operational Security, Confidentiality, and You

First off, VirusTotal logs who submitted a file, their IP address, whether or not it was a web user or an API user, and their country. 

Second, VirusTotal allows others to search for malicious files through their service VirusTotal Enterprise. Adversaries could configure searches and trigger alerts to notify them if they see any of their malware has been uploaded. 

Finally, if the file sample or malware contains any information that is particular to your organization, VirusTotal Enterprise allows researchers to download files they are searching for–this means others could possibly determine that your organization was targeted and/or compromised by the malware sample uploaded to VirusTotal. 

Fortunately, there is a way to mitigate some of these risks if all you want to know is if a file has been seen in the wild before: instead of submitting the actual file to VirusTotal, submit a sha256 file hash instead. Confused? Let me demonstrate:

On my Linux host, I’ll create the file “file.txt”. file.txt will have the text “ayy lmao” in it. That is the only content that will be in the file. It’s a plain text file with two words in it. I will run the sha256sum command against the file and get the output:

b8c7888857d972b54b1fe789bbda3559ae781ba53650e6dcfb11e5f4b6528a74

Next, I will visit VirusTotal in my web browser and copy and paste the sha256 hash above into the search bar.

Zero results.

This means that nobody has been silly enough to upload an identical file with this hash to VirusTotal. Bear in mind that the name of the file does not affect the file’s hash. Let’s try another example. 

While it definitely has legitimate uses, the Sysinternals tool PsExec is commonly used by attackers to pivot from one Windows system to another. Sometimes, tools of this nature that have legitimate uses but are also leveraged by attackers on Windows are referred to as LOLBAS–living off the land binaries and scripts. I downloaded the latest version of Sysinternals PSTools package and ran Get-FileHash -Algorithm sha256, to get the sha256 hash of PsExec.exe:

57492D33B7C0755BB411B22D2DFDFDF088CBBFCD010E30DD8D425D5FE66ADFF4

This time, we get a PLETHORA of information about the file, including file hashes, identified file types, and file size:

A variety of timestamps identify when the file was created, the first time it was submitted, and the last time it was submitted along with the various filenames it was submitted as:

And scores of other data. This file is a permanent collection in the VirusTotal vaults, and if I log in with my VirusTotal Enterprise account, I can find it and download it:

For another example, let’s upload “file.txt”, containing the text “seems legit.” We can confirm the file hash before we upload the file:

Then we can upload the file to VirusTotal:

And now, it’s there forever:

The lesson of this story is that if you can’t afford for a piece of malware you are analyzing to become public knowledge, see if somebody else submitted it first by searching for one of its file hashes before uploading the file itself. Once it’s on VirusTotal, it’s on the internet. And once it’s on the internet, there’s no removing it.

Bringing triage all together: PEStudio

One of my favorite tools for analyzing malware samples is PEStudio by Marc Ochsenmeier. PEStudio is geared more towards Windows binary files, but it can be used to analyze just about any file for more information and report anything unusual or anomalous it observed in the file being analyzed. It will hash the file, check its imports, record its metadata (file name, file size, debug information, PDB strings, whether or not the file is signed and by what authority, etc.) and search the file hash to VirusTotal to see if it has been observed in the wild:

Conclusion

Malware analysis is a subject matter as wide as it is deep. My coverage of malware triage is by no means a complete picture of the process–malware can take many forms. And because malware can take so many unique forms, there are a great number of tools and different approaches that may be necessary to successfully triage different samples. 

I hope this blog post provides a brief glimpse into this subject and piques your curiosity. Be curious, but always be careful when analyzing malware–and ensure your analysis systems and virtual machines are properly segmented to avoid infecting systems outside of your analysis network.

About Hurricane Labs

Hurricane Labs is a dynamic Managed Services Provider that unlocks the potential of Splunk and security for diverse enterprises across the United States. With a dedicated, Splunk-focused team and an emphasis on humanity and collaboration, we provide the skills, resources, and results to help make our customers’ lives easier.

For more information, visit www.hurricanelabs.com and follow us on Twitter @hurricanelabs.