On Windows, PowerShell has the get-filehash function that supports a variety of hashing algorithms. My understanding of looping in PowerShell isn’t quite as strong as it is with the bash shell on Linux, so here I just demonstrated running Get-FileHash multiple times, specifying a different hashing algorithm each time for “file.txt” containing the string “ayy.”
These are just some of the more well-known and well-established hashing algorithms. There are other hashing algorithms that are somewhat unique to malware analysis, such as ssdeep, peHash, and imphash–which are designed for special use cases.
Originally a GCHQ project, CyberChef is a self-contained web page that users can download and run locally that hosts a variety of tools that can be used to encode and decode files.
For example, if you receive a sample that appears to be encoded, and you know that its say, base 64 encoded, or URL encoded, gzipped, or otherwise obfuscated, CyberChef has “recipes” users can apply to their data to decode it. Do you know that a sample is obfuscated, but you don’t know how exactly it’s obfuscated? CyberChef can also try to auto-detect the obfuscation for you and decode it automatically.
If you’d like to see a small demonstration on how CyberChef works, I demoed using it in a previous blog post–Threat Hunting with Splunk: Part 3. I used CyberChef to decode encoded (base64) PowerShell commands, decompress gzipped content, and remove null bytes from the output. I then fed the Shellcode to a debugger/emulator to see what it did. Pretty useful, eh?
In addition to CyberChef, consider scanning your sample with YARA. YARA is a multi-platform pattern matching engine designed to identify and classify malware based on patterns configured in rule files. If it helps, think of it as a Snort or Suricata, except for individual files. If a malware sample matches a pattern in a rule file, it triggers an alert. That means it’s effective for malicious script files, dropped artifacts, as well as malicious executables (Windows PE, Linux ELF, MacOS mach-O), etc.
Some might draw the parallel that YARA is an endpoint protection tool. After all, it matches patterns based on rules and those are basically AV signatures, right? But that’s really not the case because YARA doesn’t offer a real-time protection or quarantine component. All it does is manually scan and pattern match.
While I’ve mainly focused on obfuscation in plaintext script files, what about binary files? What can be done to detect anomalies, or obfuscation in binaries? For starters, there is a very simple utility available for most operating systems called strings. This command can scan a file and show you all of the ascii strings in a file. There’s a very good chance that if the strings command doesn’t detect a whole lot, then the executable might be packed, XOR encoded, or otherwise obfuscated. There are many tools out there that can be used to try to determine what sort of obfuscation is in use and decode it. For example, FireEye’s FLOSS tool and, again, CyberChef.
Determining Functionality (imported functions)
Figuring out what functionality malicious binaries try to import is critical to determining what their capabilities are as well. When we talk about malware that imports functionality, it refers to when malware makes use of Operating system shared libraries (e.g., Windows DLLS, Linux SO files, and/or macOS dylibs) in order to perform different tasks, for example, writing to the registry, spawning a process, writing files, interacting with other processes, making network connections, etc. This sort of goes hand-in-hand with dumping the strings for a binary because many times file strings will contain the name of a function or the filename of a library it wants to import. This information provides clues and gives us something to look up in order to determine what the binary wants to do with that function or shared library.
Has the file been seen in the wild?
Finally, we come to the question “has this file been seen before?” The most common way malware researchers attempt to answer that is by asking VirusTotal and/or other threat intelligence and ingestion sources. In case you don’t know, VirusTotal is a website that allows anyone to submit a file, URL, or IP address and see if that indicator the user wants to search for has been seen “in the wild” before by a variety of different endpoint detection engines, IP address/URL blocklists and/or web reputation services. When it comes to submitting files, VirusTotal allows users to upload any type of file, so long as it is under the file size limit. However, there are risks associated with submitting files to VirusTotal.
Virustotal, Operational Security, Confidentiality, and You
First off, VirusTotal logs who submitted a file, their IP address, whether or not it was a web user or an API user, and their country.
Second, VirusTotal allows others to search for malicious files through their service VirusTotal Enterprise. Adversaries could configure searches and trigger alerts to notify them if they see any of their malware has been uploaded.
Finally, if the file sample or malware contains any information that is particular to your organization, VirusTotal Enterprise allows researchers to download files they are searching for–this means others could possibly determine that your organization was targeted and/or compromised by the malware sample uploaded to VirusTotal.
Fortunately, there is a way to mitigate some of these risks if all you want to know is if a file has been seen in the wild before: instead of submitting the actual file to VirusTotal, submit a sha256 file hash instead. Confused? Let me demonstrate:
On my Linux host, I’ll create the file “file.txt”. file.txt will have the text “ayy lmao” in it. That is the only content that will be in the file. It’s a plain text file with two words in it. I will run the sha256sum command against the file and get the output: