Leveraging Windows Event Log Filtering and Design Techniques in Splunk
When working with Windows event logs in your Splunk environment it’s typical to come across two scenarios: How do I get rid of specific events that aren’t necessary for my use case? How do I trim off the duplicated text at the bottom of events that’s chewing up my license?
The answer to both of these questions is by leveraging the advanced filtering techniques at the input level and event routing at the indexing level. In this tutorial, I’ll explain how you can do both of these things so you only bring in the data you need. Before we get started, you should consider a strategy for how you ingest your Windows event logs. You can default to allow all with explicit denies, default to deny all with explicit allows, or a hybrid of explicit allows/denies.
It’s important to understand that by default all event codes will be indexed if you do not specify a whitelist. If you add a single whitelist statement, Splunk will only index events which match your whitelist for that particular input stanza and ignore the rest of the events. You should also note that Splunk processes whitelists first, then blacklists. This means you can combine whitelists/blacklists together to achieve a certain result (I.E, default allow all in X eventcode, but deny specific strings in X eventcode).
The primary benefit of whitelists/blacklists for Windows Event Logs is that we get to do the filter at the ingestion pipeline instead of at the typing pipeline, which is how filtering is traditionally handled in Splunk. This means you can filter out data before it’s ever sent over the wire and save yourself from wasting precious bandwidth and compute cycles on your indexers. The caveat being the inability to alter event text, so if you want to do that you still need to do this on the indexers (which I will go over as well).
Lastly, I will cover how you can structure your inputs deployment in a layered approach. This is going to give you more control over what data you’re bringing in and allow you to more easily manage what hosts send what data as your environment grows.
Combining these strategies will get you the most bang for your buck by optimizing your Windows event log data ingestion.
Give Me All the Events, Except Certain Ones
This is the most common approach to working with Windows Event Logs, and it’s typically the easiest way to get your desired result. With this method you are never declaring a whitelist. So, the default behavior is to grab all event codes under that Event Log Channel. To filter down you then configure blacklists to drop specific event codes that you do not need. Once you have your standard event code blacklist, you can hone in on specific events which aren’t useful and use advanced filtering techniques to drop those.
Only Give Me Specific Events
This is another common approach often used when you have a limited amount of license capacity to work with. This method is going to require that you explicitly whitelist exactly what you want under each of your eventlog stanzas, and you are not going to be using any blacklists. When you use this method, you want to be careful you’re not missing precious logs that you forgot to whitelist.
This is less of an approach and more of a reactionary configuration. Commonly happens when you didn’t follow one of the other two approaches. The other possibility is you have some complex requirements that make it necessary to juggle whitelists/blacklists to get what you want. As in the previous school of thought, you need to be careful not to miss important event codes.
The above image shows how you should visualize this approach. The idea is that you’re going to split up your configuration into multiple apps in order to apply more granular configuration to the correct set of hosts. This is something we consider to be a best practice and will be useful for more than just your Windows Event logs.
Now let’s talk about how we get started with filtering. For many environments, you can get away with using only basic filtering. By “basic filtering” I’m referring to straightforward whitelists/blacklists that only filter on event codes. BEWARE: You can NOT mix basic and advanced formats under the same wineventlog stanza, this will break your ingestion for that log source. For this reason, I would recommend doing your basic filters in the advanced filtering form so that you can easily expand in the future if needed (example in code block below).
The first thing you need to do is find the inputs.conf file that is specifying your Windows Event Log stanzas–a lot of people put this in the local folder of the “Splunk_TA_Windows” and deploy the app to all of their Windows hosts. I would caution against this because as your environment grows you will likely need to start creating different apps that turn on specific inputs stanzas with specific event codes for sets of hosts. You should think about this as separating DC’s, App Servers, Exchange Servers, etc.
Later in this blog post, I will cover how you can do this if you aren’t already. Regardless of your setup the approach here is still the same, though you may need to change your app context.
Under the stanza (for example, [WinEventLog://Security]), you can specify either whitelist or blacklist, followed by a number if you have multiple whitelists/blacklists. Each stanza can support up to 10 whitelists and up to 10 blacklists (I.E, whitelist & whitelist1-9). You should note that by default the Windows Add-on comes with blacklist1 and blacklist2 already in use. Once you define your list, you simply specify which event codes apply to that list, using commas and dashes to break it up into groups. Commas will signify you’re ending that “group”, and dashes are used to specify a range for the “group”.
Here’s an example of what your stanza might look like this:
To break this down:
- No whitelist means we will ingest ALL Security event logs.
- Blacklist3 will be read after our implicit whitelist, which means we will NOT receive anything listed in the blacklist. In this case we won’t get the event codes listed in the range/single keys.
- Note: The reason we’re using blacklist3 is because the Windows add-on comes with blacklist1 and blacklist2 already defined under this stanza in default/inputs.conf.
What happens when the basic filtering doesn’t get the job done? Are you getting spammed with a particular string but you need the events in that event code? Or what if a single service account is generating large amounts of logs between two DC’s?
This is where you can use advanced filtering techniques to get really granular and selective on the incoming data.
When creating a whitelist or blacklist, you are really operating on a set of key-value pairs. Notice I say “set” because you can actually use multiple key-value pairs together (i.e. an event code AND part of a message). These key value pairs use regular expressions to match on your event code text. It’s important to know that you can only specify a key once, and if you specify it multiple times it will only use the last duplicated key in the list.
You can find a list of possible keys in the Splunk Docs Create advanced filters with ‘whitelist’ and ‘blacklist’ section.
For this post, we’re going to focus on ComputerName, Message, and User as those are the most commonly used keys from my experience. ComputerName and User are pretty self explanatory, you can use these in conjunction with an event code in order to whitelist/blacklist certain events.
Let’s break down these examples below:
blacklist = EventCode=”4663″ ComputerName=”(US-EXC-01|EU-EXC-01)\.COMPANY\.com”
- EventCode – Only apply this blacklist to Security Event Logs where the event code is 4663.
- ComputerName – Only apply this blacklist to Security Event logs where the Computer Name is “US-EXC-01.COMPANY.COM” or “EU-EXC-01.COMPANY.COM”.
blacklist3 = EventCode=”4624″ User=”HealthMailbox”
- EventCode – Only apply this blacklist to Security Event Logs where the event code is 4624.
- User – Only apply this blacklist to Security Event Logs where the User is “HealthMailbox”.
In these two blacklist examples we were facing a common issue.
For the first one, a couple of servers were generating a very large amount of “An attempt was made to access an object” logs, which were not helpful to our use case and also taking up a lot of disk/license for no reason. In the second blacklist, we had a service account that was also generating a large volume of “An account was successfully logged on” events that were expected and not useful to our use case.
The last key we’ll talk about is the Message key, which is likely going to be the most important one. Often what you’ll find is that you have some events where you only care about specific values in the event which don’t have a key defined.
One example would be if you wanted to ingest logs that allow you to detect Kerberoasting. You would want to make sure you are whitelisting Event ID 4768 + 4769 with Ticket Encryption types 0x1, 0x3, 0x11, 0x12, 0x17, and 0x18. The first thing you’d notice is that “Ticket Encryption Type” is NOT a key value listed in the docs. This means you will have to use Regex to match on the Message key instead.
Here’s what that config ends up looking like:
- EventCode – Only apply this blacklist to Security Event Logs where the event code is 4768 or 4769.
- Message – Only apply this blacklist to Security Event Logs where the Message field contains the Ticket Encryption Types of 0x1, 0x3, 0x11, 0x12, 0x17, or 0x18.
When dealing with the Message field, it’s important to remember that these are multi-line events. What appears to work in your Regex tester may not translate into Splunk’s Regex engine where it will try to decide between using the default mode or multi-line mode. To be safe, you will want to manually specify the multi-line flag “(?m)” to force Splunk to use that mode. Multi-line mode causes “^” and “$” to match the begin/end of each line (not only begin/end of string).
In this example, if we were to leave off the multi-line flag we would NOT end up with the expected behavior, this would result in logs not being ingested. Splunk would have parsed the entire event as a string and therefore interpret our regex with the “$” indicating the very end of the event. Instead, what we needed was for Splunk to match on the end of the ticket encryption type line, so we did not accidentally match types that started with a 1 or a 3 (such as 0x13, 0x14, or 0x32). Our regex group makes sure it will only capture the values we care about for Kerberoasting.
If we take this advanced example a bit further, what would happen if we wanted all of that plus only whitelist where the service name was “krbtgt”?
This is where another flag modifier will come in handy — “(?ms)”. The “s” flag allows your dots to match newline characters. This lets us do what you see in “whitelist2” instead of what is done in “whitelist3” (as you can see, it’s a huge difference). These two flags are going to be critical to understand in order for your whitelists and blacklists to work.
In this section I wanted to cover a best practice that we have for our customers, which is saving them from unnecessary license usage. If you’ve spent any time with Windows event logs, you’ll quickly see a pattern where many event codes contain a wall of text at the bottom of each event which provide no value.
If you take a look at this event, you can see that you are actually using up more disk space/license on the ending event description than the actual event text. There is no benefit to keeping this text in the event. The way you would get rid of this text is by using a props/transforms set that discards that text and keeps the rest of the event in tact.
Remember, this configuration needs to go on the first Splunk Enterprise system where your Windows Event Logs are being forwarded (hopefully straight to your indexers).
We can do this easily with the configuration example below:
Let’s dive into what exactly is happening with this transforms call.
First we’re setting a large lookahead value that allows us to grab all the actual event text we care about. Next we’re using regex to capture the entirety of the event text and store it in a capture group, leaving off the wall of text that we don’t care about which starts with “This event is generated”.
Ultimately this entire operation is going to discard everything except what is inside the capture group when it writes the event to a bucket.
Now, if we take a peek at the props stanza, you’ll notice I have both a source and sourcetype stanza for the same thing. I felt this was worth noting because in the older version of the Windows Add-on all of the configuration was done by sourcetype. In the newer version of the Windows Add-on all of the configuration is done by source instead of sourcetype. If you’re not certain which one to use, just go with the source call because that should work for the new and old version alike.
It’s also important to remember that you are responsible for understanding how this will permanently alter your own production data in Splunk, so please be careful!
How Do I Apply This?
Now that you have all the information you need to make these changes, you might be wondering “Where on Earth does this configuration go?”
I’m going to assume you’re following Splunk best practices, meaning you have a Deployment Server setup with your Splunk Universal Forwarders configured as clients and a standalone indexer (or index cluster) where those universal forwarders send data to. If your environment is setup differently, you may have to adjust this process and put the configuration somewhere else.
Whitelists, Blacklists, and Input Layering
TL;DR: Get your inputs.conf (optionally containing whitelists/blacklists) to your UF’s using a Deployment Server.
If you have administrative experience with Splunk, you’re probably used to putting configuration similar to this on an indexer or heavy forwarder since it’s altering data you index. Winevent Log whitelists and blacklists are a special exception because these operate at the input level, directly on the UF (they have a special pipeline/processor set). This means we need to put this configuration on the Deployment Server within $SPLUNK_HOME/etc/deployment-apps.
If you think back to earlier, you’ll remember that I mentioned most people will put this configuration directly in Splunk_TA_windows/local/inputs.conf. However, since you’re reading this, I hope to steer you in a direction that will allow you to make this configuration more modular as your environment scales.
Leave “Splunk_TA_windows” alone, don’t modify it at all. Instead, create a set of apps following a naming scheme. You want to think about how you can apply this with a layered strategy to create “base” layer and then add any custom layers on top which may be applied to a specific server or set of servers.
Here’s an example of what this list might look like when you’re done:
The uf_winevent_base_inputs would be deployed to all of your Windows systems, and you would deploy other apps as needed depending on the role of each server.
For example, your Domain Controllers would end up with both the uf_winevent_base_inputs and uf_winevent_ad_inputs apps in this example. Remember that if you have overlapping whitelist/blacklist numbers in two apps, lexicographical order is going to determine which whitelist/blacklist wins (meaning you might need to adjust an app name so that it wins).
Once you have your naming scheme and apps created you’re going to define the stanzas you want in inputs.conf with the whitelists/blacklists configured under each relevant stanza. Your end result might look something like this:
Now that you have your strategy in place, you need to create serverclasses which follow the same pattern of one serverclass per “layer”.
For simplicity sake, I like to follow the same naming convention and call the serverclass something similar to the relevant app name. Lastly, make sure when you create your server classes to watch the scope of your client list so that you’re not missing any hosts or applying configuration to the wrong hosts.
Eventually, what you end up with may look like this on your deployment server:
Props and Transforms
TL;DR: Get your props.conf & transforms.conf to the Splunk Enterprise system that your UF’s forward winevent logs to.
Now what about applying that configuration that got rid of those long walls of text on the Windows Events? In a Splunk deployment matching current best practices for receiving data from Universal Forwarders, this configuration is ultimately going to need to go on your indexer(s). If you have a standalone indexer, it should be configured as a client on your deployment server. In this setup, you would put the configuration in an app on your deployment server in $SPLUNK_HOME/etc/deployment-apps/ and add the app to the serverclass for your indexer. If you’re using an indexer cluster, then you want this configuration to go in $SPLUNK_HOME/etc/master-apps/, and then apply your cluster bundle.
Regardless of how you’re setup, your configuration should look something like this (we’ll use the app ‘baseline_windows_props’):
Once this configuration is applied to your indexers, any Splunk Universal forwarders sending data to these indexers will have the event descriptions removed.
Hopefully you found this tutorial to be helpful in understanding how you can use basic and advanced filtering techniques to get the most out of your data, while also cutting down on unnecessary compute and license usage.
After navigating through this content, you should feel confident in adapting this to fit your use case — provided you have some Regex experience. You should have also gained a basic understanding of how to layer your inputs and serverclasses to create a modular management system. If you have any whitelists/blacklists you found helpful and want to share, make sure to drop a comment below!
- Windows Event Log Doc: https://docs.splunk.com/Documentation/Splunk/latest/Data/MonitorWindowseventlogdata
- Windows Filtering Keys: https://docs.splunk.com/Documentation/Splunk/latest/Data/MonitorWindowseventlogdata#Create_advanced_filters_with_.27whitelist.27_and_.27blacklist.27
- Event Code Info: https://www.ultimatewindowssecurity.com/securitylog/encyclopedia/
About Hurricane Labs
Hurricane Labs is a dynamic Managed Services Provider that unlocks the potential of Splunk and security for diverse enterprises across the United States. With a dedicated, Splunk-focused team and an emphasis on humanity and collaboration, we provide the skills, resources, and results to help make our customers’ lives easier.
For more information, visit www.hurricanelabs.com and follow us on Twitter @hurricanelabs.