As a Splunk administrator, I often find my first instinct when analyzing data is to index it. Splunk offers a lot of tools and convenient interfaces, making it a one stop shop for data analysis. However, this approach can sometimes lead to the type of data collection that doesn’t serve my best interests in the long run.
In this blog post, I will discuss my thought process when onboarding data, including some helpful points for Splunk admins as they are navigating both their short and long term Splunk indexing strategies.
Developing a strategy for indexing data in Splunk
Every business’s needs are different, but below are a few of the basic concepts to keep in mind while developing your strategy for what you really want to do with your data.
1.) Save yourself some time–and money!–by starting off with a plan
This might seem like a really basic idea, but you’d be surprised how often I see requests to onboard data, without a clearly defined plan on how to use it.
You need to have a solid understanding of your use cases so you can determine how long to keep your data. Will it be used a month from now? Will you need to build and establish a pattern across 6 months?
You don’t want data to sit in an index and never be touched–it may be “always nice to have” or “there just in case,” but is it worth the storage costs? Also, poorly defined searches that don’t have an index specified means Splunk will have to comb through all your data, including unused clutter you’ve just collected.
2.) Know the specifics and what you need out of your data sources
A lot of pre-trained data sources have standards on what’s extracted. Some sources like JSON will have extractions pre-defined by the nature of the format. Maintaining CIM standards for your data is always critical–particularly for security analysis and data model use cases.
Outside of those instances, however, what extractions are going to get used?
Particularly dense logs equal bigger bucket size where the data is stored on the indexers. This might not matter for a single source, but across all the indexes, it can really add up. For this reason, I recommend only extracting fields and helping to form clean logs that will only convey data that will get used. There are options for cutting down events pre-index time, such as usings props and transform statements in conjunction with tools like SED.
3.) Doing a little troubleshooting? Just upload it.
In some cases, I just want to upload a parcel of data for troubleshooting purposes so I can leverage all the tools Splunk has to offer. Administrators often do this with things like Splunk diags. It can also be useful to upload a parcel of data to identify patterns or key points.
Once the goal has been accomplished, however, there’s no longer a need to keep the data around. In these cases, it can be useful to have a short lived index setup, where you can upload logs to work with them as needed. Then you can move on knowing the data will roll off after 7-30 days and no longer take up any of your storage or other Splunk resources.
Hopefully this blog post helped you gain a better understanding of Splunk admins and what it’s like to be a data junkie. Happy Splunking!