Using Stats in Splunk Part 2: Seasonality
Seasonality, which states predictable variations in data will occur over specific time periods, is one the most important concepts in statistical analysis of time series data in Splunk. For example, it’s expected that you’d see more data logged during business hours, and less during off-hour times. These variations can throw a wrench into typical anomaly detection techniques–as outlined in part 1–if not taken into account.
This article will offer an explanation of seasonality as well as techniques for taking it into account in your searches; we will also provide you with a practical example of how to account for this type of behavior in your anomaly detection searches.
Real world example
To help explain seasonality, we’ll work through a real world example in detecting unexpected dips in indexed data. Many sources of machine data generate more logs during normal business hours (when they’re being actively used), so this is a situation where taking seasonality into account is appropriate.
Before we start, here is the full example that we’ll break down:
Once you get the stats generated in the lookup–and have a search that populates it every so often–you can implement the following:
There are two spans that ensure data is accurate. The first takes into account the fact that data may not come in during certain time periods. The following will fill in data during those spans where no logs are generated.
This packs data into a specific format, makes it continuous, fills in null values with a value, and then unpacks the data. Note that the xyseries command takes exactly three arguments. If you have more than three you’ll need to do something like the following:
Using !!!!!
is arbitrary; you only need to have a separator string that won’t appear in your data normally.
The second seasonality piece is the following:
This should be more self explanatory. We get the day of the week and the hour from the timestamp, and we evaluate when it occurs. From this, we can calculate statistics based on which category the event fits into.
Conclusion
With these techniques, you can now incorporate seasonality in your searches. It is a powerful technique which can really help you cut down on the noise in your alerts. Be sure to keep an eye out for Part 3 of this series–I’ll be taking a look at some less commonly-used commands and how they may (or may not) be useful in your investigations.
About Hurricane Labs
Hurricane Labs is a dynamic Managed Services Provider that unlocks the potential of Splunk and security for diverse enterprises across the United States. With a dedicated, Splunk-focused team and an emphasis on humanity and collaboration, we provide the skills, resources, and results to help make our customers’ lives easier.
For more information, visit www.hurricanelabs.com and follow us on Twitter @hurricanelabs.
