If you’re like me, you love Splunk. It’s an amazing tool for monitoring and troubleshooting your systems. But there’s one thing that can drive a Splunk sysadmin crazy–high CPU usage. With this in mind, I’m going to show you how to identify and reduce high CPU usage in Splunk Stream.
Let’s get started!
Introducing Splunk Stream
First, a little about the Splunk App for Stream.
The advantage of this Splunk app is that it allows for the collection of wire data from many different source types that might be otherwise difficult to capture in Splunk.
As a matter of fact, at Hurricane Labs we use Splunk Stream to collect DNS event data as part of our comprehensive security alerting services. However, despite being admirers of the app’s capabilities, deploying it has been known to take a toll on CPU utilization of the server managing distributed Stream forwarders.
If you’ve had this happen to you, just know, you’re not alone!
Need help managing your Splunk environment?
Let’s connect to discuss your requirements and how our Splunk experts can help.
Troubleshooting symptoms of high CPU usage in a distributed Splunk Stream deployment
In a distributed deployment mode, you have a Splunk Enterprise instance that functions as a management node for the Universal Forwarders which are collecting data. The Splunk Enterprise host runs the splunk_app_stream app, and the Universal Forwarders (UFs) run the Splun_TA_stream app.
Get an informative overview of this type of deployment with Splunk’s helpful diagram.
Now, once you’ve configured a distributed Splunk Stream deployment, you may see high CPU utilization on the Splunk Enterprise instance where splunk_app_stream is configured. This is often due to the overhead of a large number of Stream UFs sending traffic to a web address on your system, https:///en-us/custom/splunk_app_stream/ping. By default, this ping event happens every 5 seconds.
To view incoming requests, you can tail the splunkd_ui_access.log and observe incoming requests. At this point, if these logs are flying by, there’s a good chance this is the cause of the high CPU utilization you are seeing.
Below you will see sample logs. Note that all these events occur in around 3 seconds.
You can further confirm the issue by logging into the Splunk Enterprise instance, opening up
top on your terminal, and pressing the c key to view the command associated with each process.
In this example, you’ll see a large number of Python processes running that are related to
The number will vary depending on the number of clients checking in. However, when I see this issue occurring, there’s always a bunch of these that show up pretty consistently while watching the process list refresh.