How to Deploy Splunk SmartStore for Improved Data Storage
This blog post is co-authored by Tom Kopchak and Brian Glenn.
One of the major features released in Splunk 7.2 is SmartStore–a mechanism to use AWS S3 (or other S3 API compliant object stores) as volumes for storing your indexed data in Splunk.
We’re not going to spend a lot of time going into the details of what SmartStore is and the benefits of using it, since Splunk already has a fairly comprehensive series on this, check out the blogs: Splunk SmartStore: Cut the Cord by Decoupling Commute and Storage and Splunk SmartStore: Disrupting Existing Large Scale Data Management Paradigms.
The biggest takeaway here, however, is that this represents a fundamental shift in how data can be stored in Splunk.
Traditionally, Splunk relied on dedicated storage for each indexer, with various classes of storage to help manage costs (much of which is covered in one of our other tutorials, Splunking Responsibly Part 2: Sizing Your Storage). With SmartStore, the concepts of hot, warm, and cold storage are replaced with a cache manager and an object store. This effectively allows for historical data to be stored for longer at a lower cost than traditional on-premise storage options.
Let’s assume you have an existing Splunk environment and want to take SmartStore for a spin. Here’s what you need to do to get it deployed.
Standard Warning About Changing Things
Whenever you’re manipulating index settings, there can be a risk of data loss. If you’re fortunate enough to not be using your test environment for production, doing this in a lab is absolutely the best way to experiment. All of this testing was first done on a standalone Splunk instance before trying it out on our lab environment, which closely mirrors one of our managed Splunk clients.
The common element you will need to implement SmartStore is an S3 compatible object store. For our testing, we used Amazon’s S3, so we didn’t have to worry about S3 compatibility. If you’re using a third-party mechanism, Splunk docs has some information on what is required for Smart Store to work. You can read more about managing indexers and clusters of indexers in the docs.
Assuming you’re using S3, configure a new bucket and enable API access as follows:
1. Navigate to S3 in the AWS Management Console:
2. Create a bucket:
3. Give the bucket a (globally unique) name. For testing, none of the other defaults need to be changed. Record this bucket name and the AWS region where it is created – you’ll need this information later.
4. Once the bucket is created, you’ll see it in the AWS console:
5. Create an API key by navigating to IAM in the AWS console:
6. Navigate to Users -> Security credentials -> Create access key:
7. Record both the Access key ID and the Secret access key shown – you’ll need this later:
8. While not strictly necessary, we also created a folder inside the bucket for each test we did (standalone for our initial testing, CCN for our test customer):
Once these steps are completed, you should be done with the AWS testing.
Deploying SmartStore on a Standalone Indexer
For our first test, we performed a clean installation of Splunk 7.3 on a Ubuntu virtual machine, and proceeded to configure a new Index with Smart Store. For testing, this simply involved changes to $SPLUNK_HOME/etc/system/local/indexes.conf on our instance.
Note: Splunk’s documentation on Deploying SmartStore gives a great example of the configuration to use.
Our final test indexes.conf ended up looking like this:
We also created a test index, cs_index, in the same indexes.conf file, and later added the _internal index as an additional test index:
Upon a restart of Splunk, we immediately noticed folders created for each index in the AWS S3 console:
Navigating through the various folders, you’ll notice the same type of Splunk index structure that you’re familiar with:
For those of you who prefer a screencast or demo, here’s a quick runthrough of the process using another index:
Once you have SmartStore writing to buckets in S3, you’ll want to make sure that data can be retrieved successfully from a SmartStore index entirely in S3. This is easiest to test when you first begin to have data in SmartStore, since it’s simple to evict everything from your cache when there’s only one or two buckets.
You can view all of the buckets in the cache manager via the REST API by navigating to services/admin/cacheman/:
To evict a bucket, you can post to the REST API using curl. For the sake of our testing, this command was used:
This will produce a lot of output when it runs:
The easiest way to confirm that the bucket was evicted is to go into the directory for that bucket and confirm that it is empty. The folder will still exist, but it won’t have any contents:
Running a search over this data will result in the bucket being retrieved locally and stored in the cache on disk, which you can see by checking the same folder:
Here is a demo of how this eviction and download process works in practice:
Extending This to a Cluster with Existing Data
After our lab testing was successful, we followed a very similar process to implement this in an indexer cluster. The configuration files were very similar, with some slight differences:
- The SmartStore configuration was put in a infra_smartstore_base app which was deployed to the indexers via the cluster master.
- The remotePath definition for the existing index was placed in the same app where indexes are defined.
After a rolling restart of the indexer cluster, our data began showing up in S3 – while still being searchable. Over the course of the next several hours, we noticed less and less disk space being consumed while more data was uploaded to AWS.
Tracking SmartStore Activity
SmartStore-related internal logs are stored in the Splunk _internal index under the CacheManager component. In our lab environment, this means we can track SmartStore activity with the following search:
Download and upload activities are also easy to locate, via the action field:
As you can see from a 36 hour graph, the amount of activity varies. There is a burst in uploading when the index is initially configured to use SmartStore, followed by bursts of download activity when searches require data that is only available in S3. Minimal activity is observed when data is only being ingested and not actively searched.
Observations and Conclusions
Overall, we were impressed with how simple it was to configure and get SmartStore working in our testing. The Splunk documentation for the implementation was well written–kudos to the documentation team for excellent work.
In our testing, we noticed that the process of retrieving data from SmartStore adds some latency to the searching process, which was noticeable in our testing for current data. However, since the use case for SmartStore primarily involves reducing the cost of long-term data storage, this is less of an issue for infrequently accessed data.
We’ll continue to produce more information on this feature as we work with it more. Let us know about your experiences using SmartStore as well.
About Hurricane Labs
Hurricane Labs is a dynamic Managed Services Provider that unlocks the potential of Splunk and security for diverse enterprises across the United States. With a dedicated, Splunk-focused team and an emphasis on humanity and collaboration, we provide the skills, resources, and results to help make our customers’ lives easier.
For more information, visit www.hurricanelabs.com and follow us on Twitter @hurricanelabs.