One of my least favorite features in Splunk is KV Store – mainly, because whenever I have to deal with it as a Splunk administrator, it’s broken in some horrible new way that I need to figure out. The goal of this post is to capture one of these troubleshooting adventures that we recently encountered in the hopes that it might help someone who runs into this same problem in the future.
Beginning with Splunk Enterprise 8.1, Splunk introduced a new storage engine for KVstore (WiredTiger). When upgrading to Splunk Enterprise 9.0 or later, you are required to migrate to the new storage engine. You can also migrate to this storage engine prior to upgrading to Splunk Enterprise 9.0 if you want.
We’ve done this migration for a bunch of clients, but every once in a while, we’ve seen some issues that require additional troubleshooting, especially if there is an error or failure in the migration or upgrade process.
While I’m not sure of the exact circumstances that led to this exact error, it appears that the root cause may have been related to a Splunk version conflict where a system was upgraded to Splunk 9.0, and then an older version of Splunk 8.x was started for some reason. The end result (and where I entered this story) was a system running Splunk 9.0 with a KV Store that wouldn’t start.
Symptoms of the issue
Based on the output of splunkd.log on the broken system, it appeared that KV Store on this host was looking to start version 4.2 with the mmapv1 (legacy KV Store) storage engine. Even with storageEngine = mmapv1 in server.conf, the system was trying to migrate to WiredTiger and failing.
Furthermore, the kvstore files in $SPLUNK_HOME/var/lib/splunk/kvstore/mongo all ended with a .ns extension, which indicates that the storage engine was mmapv1 and not WiredTiger. After a conversion to WiredTiger, you’ll instead see a bunch of files with .wt extensions.
For some reason, the system was convinced that it was running a more current version of KVstore, but the data files in KVstore disagreed. When this was occurring, KV Store didn’t start or function, and there were no logs in mongod.log (at all).
Fortunately, the splunkd.log file had some more output as to what was happening:
The splunk show kvstore-status command showed the following output:
Now, we needed to figure out why the KV Store status was showing as failed (and more importantly) how to fix it.
Researching the solution
Reviewing logs on multiple Splunk environments led us to a clue in the migrate.log file. KV Store upgrades looked to have these types of entries recorded:
The purpose of these files is not documented, and they contain no content:
Our best guess is that the presence of this file tells Splunk what version of the KV Store engine to use. We decided to try removing the versionFile40 and versionFile42 files, and creating a versionFile36 in its place to correspond to a version that used the old mmapv1 storage engine.
At this point, we crossed our fingers and restarted Splunk. To our relief, Splunk restarted and KV Store successfully came up this time too!
At this point, we needed to do a storage migration process to get the engine upgraded to WiredTiger on serverVersion 3.6.17:
After this conversion, our kvstore-status showed that we were running on WiredTiger on server version 3.6:
Next, we performed another KV Store migration to get the server version up to 4.2.17:
At this point, the server version was showing 4.2:
Now KV Store is running correctly and on the current version. We fixed the problem!
Do I expect that you’ll ever be in a situation where you will find this information useful? I hope not. Did I write this so that I can have some notes in case I ever run into a similar problem in the future? Absolutely.
This is a great example of running into a problem where you have to make some educated guesses on a possible solution with limited information to go on. I’m glad we were able to figure this one out and hope these notes might help you if you ever see this problem in your Splunk environment. If not, hello to my future self who is reading this months or years from now and again fighting with a broken KV Store somewhere.