Have you ever thought you knew how something worked, only to have your mind blown when you learned more? We recently ran into a scenario that forced us to re-examine our understanding of how the Splunk deployment server behaved with respect to local changes on a deployment client. This was a learning experience for me, so I wanted to share this knowledge in the hope that it can help someone else too.
Deployment Server Functionality
The Splunk deployment server is considered a configuration enforcement mechanism: that is, it exists to keep the configuration consistent between the deployment_apps directory on the deployment server and the target app directory on the deployment clients of a given server class.
It was our original belief that this was a bidirectional relationship, whereby the client would compare the hash of its app to what was on the deployment server and, if there was a difference, automatically update the app to match the deployment server. In fact, there were some apps – such as the OPSEC LEA app – that (at least in older versions) would write data locally and were considered incompatible with a deployment server due to this mechanism.
This understanding, however, was not quite correct.
What actually happens is that the checksum for an app is calculated once (at app installation) and maintained on the deployment client, in the file serverclass.xml ($SPLUNK_HOME/var/run/serverclass.xml). We’ll cover this file in more detail below, if you’re interested.
Each app in serverclass.xml looks something like this:
Note that the checksum is stored in this file. This value is not recalculated after app installation, and will not come into effect until a new version of the app in question is added to the deployment server (and the deployment server is reloaded).
Any local changes made to the app after it is deployed from the deployment server will not result in this stored checksum changing or the app being re-downloaded from the deployment server. The only exception to this is when the app is removed, it will re-download on next check-in.
Why Does This Matter?
The fact that checksums are only checked for changes upon a modification to the app on the deployment server can lead to configuration issues down the road. Consider this scenario:
- You deploy an app from the deployment server to a forwarder.
- Another administrator fails to recognize this app is managed by the deployment server, and makes a local configuration change to an inputs.conf file.
- This configuration will work for a very long time without any issues, until months down the road…
- The app in question is updated (which changes several configuration files) on the deployment server, then the admin reloads the serverclass.
- The next time the deployment client checks in, everything that was working stops working. This is because the entire app on the deployment client is replaced and the local configuration is lost.
This example illustrates the importance of tracking changes appropriately, and ensuring that any apps managed by the deployment server are not modified locally.
What If I Don’t Want It to Work This Way?
There’s a config file for that! Splunk allows for specific configurations to excluded from deployment server management. This is covered in Splunk Docs here: https://docs.splunk.com/Documentation/Splunk/7.2.1/Updating/Excludecontent
How It Works Under the Hood
For those of you who like technical details and explanations, let’s dig into the config files a little bit deeper:
A Look at serverclass.xml
As mentioned, the file that controls all of this behavior is $SPLUNK_HOME/var/run/serverclass.xml. This file is included in a Splunk diag, and also can be accessed locally on any system running Splunk or a Splunk Universal Forwarder.
Digging into this file can provide a ton of information about what our deployment client is (supposed to be) doing. Let’s explore a somewhat simplified example:
Each serverClass stanza of this XML file represents a server class in the serverclass.conf configuration on the deployment server. We can see that this example system is a member of the following server classes:
Within each serverClass stanza, we see the associated apps. In this example, this works out to the following structure:
- ServerClass: all_HeavyForwarders
- Includes app: if_syslog_inputs
- Includes app: infra_outputs
- ServerClass: all_SplunkInfrastructure
- Includes app: infra_license
- Includes app: infra_authentication
- ServerClass: all_linux_servers
- Includes app: baseline_linux_inputs
- ServerClass: all_splunk
As you can see, this is a really straightforward way to see what apps on a given deployment client are managed by the deployment server and what configuration parameters are applied.
A Word on Checksums
Based on our testing, the checksum calculated on the deployment server is highly dependent on the timestamps within a given app. This means that something as innocuous as touching a file (that is, updating the timestamp) will lead to this checksum changing. Be aware of this when working with deployment apps, as it could lead to unintended Splunk restarts depending on how your server classes and apps are configured (only if you reload the server class or restart the deployment server though)!
For those of you who like proof, here you go (thanks to Brian Glenn for doing the testing). Note that simply changing the timestamp on app.conf is enough to cause a new bundle to be generated:
If this functionality is not desired, it can be manipulated with the crossServerChecksum option in serverclass.conf. Setting this to true will result in the md5sum not changing if only the timestamp on a file in a deployment app is modified, as demonstrated below:
While the bundle ID is regenerated (the .bundle file has a new name), the checksum remains the same. This would result in the app not being redeployed due to a difference in checksum (because there isn’t one).
Hopefully this write-up will save you some pain and suffering when making changes to your deployment apps (or at least help you understand the cause of your pain and suffering if you find out about this too late). If you found this tutorial helpful and there’s another aspect of Splunk configuration you would like to learn more about, let me know!