Threat Hunting with Splunk: Part 3, Getting Your Hands Dirty and Conclusion
In this series of blog posts, following Part 1 and Part 2, we have discussed Windows process creation logs and their primary sources. I provided some documentation on fields that contain excellent data to analyze, and how to get the logs into Splunk for further analysis. We also covered some introductory queries you can use to interrogate this massive body of data, methods for filtering the data without creating blindspots while you are hunting, and a couple of hypothesis and questions you can use to guide your hunting further.
In this final part of the series, I’m going to go through some hands-on examples with you. We’re going to use basic process creation log queries, and investigate some of the results together. Afterwards, I will provide you with links to various resources you can use to improve your threat hunting prowess. Let’s begin!
Hands-On Example: CPTC dataset
All of this information sounds really awesome, but I’m willing to bet you’d like a hands-on example that shows you just how interesting process creation logs can be. For this example, I’m going to utilize the 2019 National Collegiate Penetration Testing Competition dataset. If you’re interested in this data for following along, I’ve provided a link in the “Additional Reading and Resources” section below. Since we’ve been doing such a great job picking on PowerShell so far, that’s exactly what we’re going to continue doing here.
Be aware that this dataset is from a security competition, so there are going to be a lot of weird instances of PowerShell being used, and a lot of stuff that looks anomalous observed in high frequency. On a typical network, if you see a lot of weird, unexplained, obfuscated PowerShell, that usually means you have a significant problem–either there are some business processes that are completely unknown to you that you need to become familiar with in order to filter out the results, or you have a very deeply entrenched adversary and have a long fight ahead of you before you root them out of your network. Keep this in mind that the results from these queries aren’t typical of what you find in most enterprise networks.
Now, what’s really awesome is that the CPTC dataset has both EID4688 logs, and Sysmon EID1 logs. So I can continue to offer you identical queries side-by-side. Let’s start with a basic query for both WinEventLog and Sysmon:
I set the timeframe for this query to “all time” to just show me everything PowerShell that has been indexed in the dataset. I waited a little while for some results to come back and sorted the results from most to least frequent. Let’s take a look at some of what came back:
The text in this image may be difficult to read. You may want to try zooming in to see the details of the image. These are some of the search results from our query.
The first two entries are for splunk-powershell.exe out of C:\Program Files\SplunkUniversalForwarder\bin. It’s probably safe to assume that these results are legitimate. If we wanted to filter out these results for next time, we could slightly edit New_Process_Name/Image=”*powershell.exe” to “*\\powershell.exe” instead. “Veeam.ps1” is likely an installer script for the Veeam software platform, while Install.ps1 and team0.ps1 are probably scripts utilized for this competition.
The next couple of entries are numbered PowerShell scripts that seem to be used to install various odds and ends for the competition. They’re probably legitimate. If we had access to the workstations and the network, we could probably pull a copy of these PowerShell scripts and confirm whether or not they are legitimate and/or expected behavior.
I scrolled down a little bit and I noticed the first bit of overly unusual activity:
Just because you see encoded PowerShell doesn’t always mean its malicious, but it definitely warrants closer analysis.\
These entries are attempting to execute PowerShell with the script execution policy manually set to bypass, and passing a long “EncodedCommand” string. Without getting too deep in the weeds, the EncodedCommand option allows you to pass base64 encoded text directly to PowerShell and lets it handle decoding and interpreting the content. It’s a basic method of script obfuscation. I’m going to show an effective method of scrutinizing this encoded data, using an awesome open-source tool called CyberChef.
Cyberchef is a tool you can use for transforming data to and from various formats using “recipes” to define how you want to transform the data. Let’s take a look at the entry:
We want to copy the content past the “EncodedCommand” field, and paste that into the input box on the CyberChef page.
Once you have done that, under “Data Format”, double-click “From Base-64”and finally, click “Bake!”:
Cyberchef is an analyst’s Swiss Army chainsaw.
The results will look something like this:
Looking good, but we can still make this easier to analyze with a simple trick.
You’ll notice that there are periods between every character. These are null/control characters. You don’t have to remove them, but if you WANT to remove them for readability purposes, CyberChef can handle that too. Under the “Utils” section, select “Remove Null Bytes” by double clicking it. The output field should look like this now, or something similar:
Taking a closer look, this script doesn’t look all that malicious, but it was encoded anyway. Why? Sometimes that’s just how software vendors do. A clue that might have told us this was expected activity is the number of times these encoded command arguments have been observed:
191 executions. Either this is expected activity, or this network is very compromised.
Sometimes (a lot of the time, hopefully) all of your investigations from threat hunting result in benign activity. It’s not glamorous, but having nothing significant to report from the results of a hunting exercise is the best you can hope for.
Let’s try something a little bit different; instead of looking at results from the query from the highest count to lowest, let’s rearrange things from lowest to highest. When you’re looking at the tail-end of activity for a particular process, things start to get weird. You run into things that have only been executed once, or a handful of times and you have to start wondering why that’s happening:
Seems legit. As in, not at all.
These commands seem to go on for a while, all one slight variation or another, so we’re going to pick one Powershell command line argument and pick on it, just like before:
Just like before, this looks VERY unusual. Let’s start by looking at the PowerShell flags being passed:
-noni – means to run this command in a non-interactive window
-nop – means to run this command without applying a PowerShell profile
-w hidden – sets the window style to hidden
-c – runs the string that is passed after this argument as a PowerShell command.
These are a bunch of really suspicious PowerShell execution options, but what about the command PowerShell is running?
There’s a lot to unpack here, but here is what’s going on in a nutshell: This command is delivering a payload of some sort. Its base64 encoded, which we can tell from the:
But the payload is also GZIP compressed, which we can tell from the:
Let’s bust out CyberChef again and see if we can discover some more clues. Take a look at:
And copy EVERYTHING in between the single quote characters into the input field for CyberChef. Under the operations section, select ‘From Base64’ under Data Format, and then under Compression, select Gunzip. Watch the magic happen!
Just like with our first example, make sure that “From Base64” is the first item in the recipe, then add “Gunzip” to decompress our payload and then…
BAM! Watch that bacon sizzle!
There’s a lot of… well, interesting things going on here. Whenever you see “DefineDynamicAssembly”, PowerShell is trying to load assembly or low-level CPU code.
You may notice that there is another, smaller base64 block in this decoded PowerShell script. This is the shellcode or assembly that PowerShell is attempting to load into memory and run. Admittedly, I’m pretty terrible at reverse engineering, and I have next to no knowledge of assembly, but with a bit of luck, I found an excellent blog post that guides you through the analysis of an identical payload using cyberchef, and an excellent tool called “scdbg”.
To summarize, that smaller block of base64 can be converted to Hex characters and fed to scdbg to interpret the CPU instructions and tell you what, exactly the code is doing.
Scdbg reveals our shellcode is a reverse shell to 10.0.254.202 on port 5555/tcp. This PowerShell script is a giant bucket of nope.
At this point we’ve verified that the block of PowerShell we’ve analyzed is nothing good. This is as good a starting point and hands-on example as I could hope to provide, having both provided examples of a false positive, as well as a true positive and how to analyze it.
Additional Reading and Resources
Looking for some more inspiration? Here are some projects/trainings that have helped me out immensely at becoming a better threat hunter in general:
Adama – This project is pretty amazing. To make a long story short, it’s a collection of queries, most of them formatted for use with ELK (ElasticSearch, Logstash, Kibana) software stacks, but with a little bit of time and tinkering, it can easily be adapted to the Splunk query language.
JPCERT – Tool Analysis Result Sheet – JPCERT has gone through the trouble of running several tools that attackers and adversaries use as a part of their operations and documenting where evidence of these tools executing can be found.
Sigma – Sigma is a project like Adama, but it’s designed to be agnostic to whatever SIEM you happen to be using. Sigma has a converter application that can turn Sigma descriptions into a query that runs on a bunch of different SIEMs (including Splunk).
Practical Threat Hunting – This is a guided training by Chris Sanders. Current price to attend the training is 647.00 USD, but I feel like the price tag is worth it. There is a lot of stuff that Chris exposes you to as a part of the training. I’m not ashamed to say that this blog post is inspired from the training itself. Also, so you’re aware, I am an Applied Network Defense trainer myself, and am I NOT being paid to advertise Chris’ training, NOR am I being given the training for free. It’s just that good.
CPTC dataset – All of this process creation log stuff sounds really awesome, but what if I want sample dataset to practice with? This is a dataset collected from the 2019 National Collegiate Penetration Testing Competition. This dataset will allow you to practice, experiment and demonstrate the value of these logs. Without Tom Kopchak’s hard work, the screen captures accompanying this post would not have been possible. Thanks, Tom!
I know that this blog post was mainly focused on process creation logs for Windows, but what about the Linux and OSX users out there? If you’re looking for something similar that could be used to log process execution data, Linux has a few options in Auditd and Snoopy. I only recently discovered that OSX has a native audit subsystem as well. A multiplatform alternative might be osquery.
We have covered a lot of ground in these three blog posts–getting you familiar with process creation logs, getting them into Splunk for analysis, and then learning how to query them to discover anomalies. We’ve gone through a hands-on example together, and I’ve left you with a handful of resources you can use to jumpstart your threat hunting activities.
Bear in mind that if you chose to follow along with me, the CPTC dataset is massive, and there is a ton of other data you can analyze as well. Also keep in mind that the CPTC dataset is not the only Splunk data set out there, with Splunk providing data for both Boss of the SOC 1 and Boss of the SOC 2 competitions.
Good luck, and happy hunting.
About Hurricane Labs
Hurricane Labs is a dynamic Managed Services Provider that unlocks the potential of Splunk and security for diverse enterprises across the United States. With a dedicated, Splunk-focused team and an emphasis on humanity and collaboration, we provide the skills, resources, and results to help make our customers’ lives easier.
For more information, visit www.hurricanelabs.com and follow us on Twitter @hurricanelabs.