Sniffing out a way to improve ES performance
In large Splunk environments with many users, the possibility that your search infrastructure could be working double time running duplicate searches is increased. This can possibly lead to degraded performance, especially on Enterprise Security search heads, which have the added overhead of running the extra tools included.
To counteract this problem, we developed a script to help sniff out searches which may be duplicated across search heads.
A word about jellyfish and Jaro-Winkler
To compare two searches, we need to have a metric that can give us an easy to understand method of comparison. Enter: Jaro-Winkler.
Jaro-Winkler is an algorithm that takes two strings and gives a value between 0 and 1 to represent how similar two strings are, with 1 being an exact match. Jaro-Winkler is a modification of the Jaro algorithm which gives greater weight to the beginning of the string. This is perfect for Splunk searches, as searches are executed in order.
Instead of implementing this algorithm ourselves, we can use the very useful Python library jellyfish. The jellyfish library implements several useful string comparison algorithms, including Jaro-Winkler.
Note: Please ensure you are downloading the correct library. Recently, a malicious library named “jeIlyfish” (a capital “i” replaces the first “l”) was found to be stealing SSH and GPG keys of it’s users. This shouldn’t be a problem if you use pip install (as a typo is unlikely), but if you regularly set up new Python libraries directly, take special care.
You will need to be able to execute remote REST searches with your account to be able to run this script. To use, simply run the script from the command line and fill out the necessary information. Output will be written to the current directory in the file output.txt by default.
The format of the output is as follows:
In testing, I found that a value above 0.9 should prompt serious consideration into whether or not you need to run the search on both search heads.
Hopefully, this script will help you eliminate any parts of your Splunk search infrastructure that are working unneeded overtime, thereby improving your performance.