Splunk’s DB Connect app allows Splunk administrators to index data from databases in Splunk. This can be extremely useful for both the DBA and Splunk user. The benefits include saving space on the database server, using Splunk as a long-term storage solution (to avoid having to grow databases to tremendous sizes), and even using that database data in your Splunk searches. With these abilities comes the chore of purging database records – but only after Splunk has indexed them.
What if your database was smart enough to know what data Splunk has already indexed, and what it hasn’t? If Splunk could tell the database what data has been indexed, we could potentially remove any possibility of data being purged from the database before you index it. Splunk pulls database records through use of the “DB input” function. This exercise requires Splunk DB Connect v3.1.0 or later.
Database rising inputs look something like:
The magic of this search only pulls data that is newer than the last time Splunk’s DB input pulled data from this source. This is called a checkpoint value, and is stored in $SPLUNK_HOME/var/lib/splunk/modinputs/server/splunk_app_db_connect/. Every time Splunk pulls data from a database rising input, it will write the checkpoint value here, along with writing it to the _internal index.
So what happens if your database feed breaks for whatever reason? Is your DBA kind enough to keep storage around for a day, week, or even month? What happens when your Splunk admin goes on vacation and can’t fix the input quickly enough to avoid losing data? The answer is simple: include a failsafe for your database to know not to automatically delete data before Splunk is able to index it.
So how does it work?
Recently I’ve been working on this very use case: looking for a technique for Splunk to be able to communicate back to the database where it stopped reading the data – or now that you know some Splunk lingo – how to share Splunk’s checkpoint value with the database purging mechanism. To accomplish this we will need the following:
– A method to find the DB input’s checkpoint value automatically
– A way for Splunk to communicate this value to the database
Obtaining the checkpoint value
Fortunately for us, Splunk is a logging and analytics tool at heart. This means we can find the rising column value for a DB input fairly easily. For our test case, the DB input in question will be called “my_test_input”, which happens to be an input reading from an Oracle database. The Splunk server running DB Connect is called “heavyforwarder1” and our search head is called “searchhead1”. So let’s start by finding that checkpoint value (hint: if you’re in a distributed environment, run these searches from your search head):
Now that we have the log we need, let’s table it out to make it a bit easier to read (and work with later):
Our search now let’s us know the time, db input name, and latest checkpoint value: