Using automation to ease the pain of your ES deployment.
Adding automated checks to your Splunk Enterprise Security (ES) deployment will help you avoid some of the configuration issues and subsequent data gaps you may be experiencing. In my travels, even trusted app updates have left inconsistencies in the requirements of ES – resulting in security monitoring gaps.
If you’re an admin dealing with the management of ES, this write-up will show you how to integrate automation into your configuration and eliminate some of the manual tasks that have been bogging you down.
Enter: The REST Command
Splunk as an alerting and reporting tool is great when it comes to the logs indexed, and it allows for much of the same on the configuration level. Having this same impact on the config level simply requires the REST command.
REST will allow you to call local config file information through search in a format similar to log data. Having that type of access allows you to monitor your overall configurations the way you would your operational and security information.
Making sure our source types are going to the right spot.
As an ES admin, you’ll need to be sure the following all flows together cleanly without losing your mind to a perfectionist rage.
Is the data still coming in? Is it being formatted properly for CIM? Is it getting tagged appropriately for the given search/data model? Is it being ingested into the data model summaries? Breathe. If you’ve been there, or maybe you’re there right now, you can agree that’s too much manual work.
First things first, let’s verify that our source types are making it into the proper data model.
This beauty was written by my coworker and friend Toby Deemer. Not only will it show you what source types are going into each data model, it will also dump the configured correlation searches that would be associated as well. This is great for a regular review of your ES deployment as a whole.
Using Splunk to help keep an eye on notable events.
Automating the management of the fields and proper CIM is a bit too dynamic for these methods, but they do affect the correlation searches and their integrity. When fields change, it can affect the correlation search integrity and no longer produce the intended results as notables.
My current best solution for this is to monitor for when notable events are no longer being generated.
First, I generate a count of all created notables in a given time period. With a subsearch, I populate the remaining enabled search names. Any existing search names with no available notable count in the core search will have a 0 filled in. These are potentially broken and need your attention.
Configure the appropriate alert actions and you can now rely on Splunk to do what it’s best at: letting you know when there’s something important to look at.
There are more configs you can look at automating with REST for the integrity of ES, such as /servicesNS/-/-/configs/conf-eventtypes/. Event types are at the core of the data models used by ES. Monitoring the changes there may help prevent lost information.
Understanding and navigating the notable suppression function.
One of the more important features when using ES is the notable suppression function. These can be the difference between a manageable ES with actionable intel and an ES overloaded with false positives. Suppressions are exclusion logic through a matching event type search on what is a non-threat. These help SOC analysts navigate the ES Incident Review, through notables that need legitimate review, by removing the events configured as suppressed.
My favorite examples are notables created by a Qualys scanner. They’re nice to have to show that your products are working, but they’re a waste of space in the Incident Review dashboard.
One of the problems with notable suppressions is that they’re a saved config that rarely requires review. Over a long enough timeline, this is something that leads to config build-up. To automate the management of this, you must first understand that notable suppressions are saved in Splunk as an event type.
Oh, look at the time.
When creating a notable suppression, you need to consider time. Should this be a permanent suppression, or should I only suppress alerts for a given time window to allow the help desk time to mitigate the issue?
The time variable is used as a field and value within the event type search itself. With a suppression’s static config, and using REST, you can then make decisions on the defined time variable.
In the example below, you will list out each notable suppression that will no longer match on notable events based on a relative time window.
And for a little extra credit…
You can use this to automate deleting old suppressions with an external call to Splunk using the REST API. The main thing to consider here is that this may affect your MTTR reporting. If you delete a suppression from a month ago, and report on the effectiveness of ES over that time period, the previously suppressed notable event(s) will not show accurate data. So, it’s a balance of config cleanup and accurate data.
Helping make your ES management better for happier Splunking.
Hopefully this tutorial will help eliminate some of the manual tasks that have been driving you nuts and make your ES management experience a little bit better. Automation FTW.