Using Splunk Streamstats to Calculate Alert Volume
Dynamic thresholding using standard deviation is a common method we used to detect anomalies in Splunk correlation searches. However, one of the pitfalls with this method is the difficulty in tuning these searches. This is where the wonderful streamstats command comes to the rescue.
This Splunk tutorial will cover why tuning standard deviation searches is different from using a static threshold, how to use streamstats, and how we can use streamstats to get immediate feedback on alert volume.
Tuning Using Streamstats
1. Understanding the problem
With a static threshold search that runs over 60 minutes, calculating alert volume over 30 days is as simple as running the count by 60 minutes over 30 days. This is different with a dynamic threshold.
Typically, a standard deviation search will calculate a threshold based on the last 7 to 30 days to compare against the last hour of data. Running the same search to see approximately how many notables would be generated in 30 days will calculate the threshold differently than when it runs as a correlation search.
When running a correlation search, the threshold is based on historical data. Using the same search to calculate the alert volume for the whole 30 days the threshold will be based on historical, current, and future data for any given hour but the last.
This is where we can use streamstats to calculate the threshold based on the last 30 days for any given hour.
Still confusing? Let’s take a look at a few examples.
2. What does streamstats even do?
To understand how we can do this, we need to understand how streamstats works. In my experience, streamstats is the most confusing of the stats commands. I find it’s easier to show than explain. Let’s start with a basic example using data from the makeresults command and work our way up.
Example 1: streamstats without options
The streamstats command will run statistics as events come in. In this case, counting how many times each color appears and generating an incremental count for our testing.
Example 2: streamstats with a window
With a window, streamstats will calculate statistics based on the number of events specified. In this case, streamstats looks at the current event and the previous. This causes the count by color to be 1 for each event because the previous event is always a different color. A common expectation with streamstats is that the window by default would be separate for each color. To do this, see our next example.
Example 3: streamstats with a windows and global=false
When global=false a separate window is kept for each color. This is the behavior we need for testing alert volume.
3. How can we use streamstats to help us?
Let’s take a simplified standard deviation search for finding an anomalous amount of failed logins.
Running this search over 7 days will count the number of failed authentications by src for each hour. Then it will calculate an upper bound based on the average and standard deviation of the counts for each hour by src. Finally, it will only show events where the failure count for the last hour was above the upper bound.
Removing the time constraint would show anomalies for the entire time frame, but the results will be different then when the search runs every hour. To get a more accurate representation, we need to use streamstats to look at the previous 7 days for each individual hour.
Changing eventstats to streamstats and specifying a window of 168 (168 hours in a week since our time span is 1 hour) with global=false will calculate the threshold for each event based on the last 7 days of counts. Constraining the time to the past 7 days in the where clause and running the search over 14 days will ensure there’s a full history for each event. To exclude the current event from the threshold simply set current=false if desired.
Bonus Example: Creating an alert volume test for ESCU SMB Traffic Spike
Below shows the original search–taken from Splunk’s Enterprise Security Content Update app:
- Change stats to streamstats window=168 (for 7 days of history) and global=false
- In this search, the calculations are done on (maxtime, “-70m@m”) so set current=false
- Remove `max(eval(if(_time >= relative_time(maxtime, “-70m@m”), count, null))) as count`. We want to keep the original count from each event
- Add the time constraint `_time>relative_time(now(), “-7d”)` and run over 14 days
Putting all that together, here is the search:
Using this, we can change the threshold, filter noise, and immediately see what our changes will do.
Using this method, you can immediately see how many alerts would be generated from a standard deviation search. This allows for instant feedback to tune out sources, users, or failure reasons. Hopefully, this post helped you have a better understanding of how you can apply streamstats when creating and tuning alerts as well as other applications.
About Hurricane Labs
Hurricane Labs is a dynamic Managed Services Provider that unlocks the potential of Splunk and security for diverse enterprises across the United States. With a dedicated, Splunk-focused team and an emphasis on humanity and collaboration, we provide the skills, resources, and results to help make our customers’ lives easier.
For more information, visit www.hurricanelabs.com and follow us on Twitter @hurricanelabs.