Building on the stock ES “Excessive DNS Queries” to look for suspicious volumes of DNS traffic
Starting from the assumption that a host suddenly spewing a ton of DNS queries can be an indicator of either compromise or misconfiguration, we need a mechanism to tell us when this happens. Here we will look at a method to find suspicious volumes of DNS activity while trying to account for normal activity.
Splunk ES comes with an “Excessive DNS Queries” search out of the box, and it’s a good starting point. However, the stock search only looks for hosts making more than 100 queries in an hour. This presents a couple of problems. For most large organizations with busy users, 100 DNS queries in an hour is an easy threshold to break. Throw in some server systems doing backups of remote hosts or MFPs trying to send scans to user machines, and we suddenly have thousands of machines breaking that 100/hour limit for DNS activity. This makes the search not only excessively noisy, but also very time consuming to tune into something an analyst can act on or even want to look at.
What we really need is a way to look at how a machine typically behaves during its normal activity, and then alert us when that machine suddenly deviates from its average. The solution here is actually pretty simple once it’s written out in Splunk SPL (search processing language). So we need to do a couple things:
- Determine a working time window to use in calculating the average
- Establish a baseline for a machine’s DNS activity
- Compare the established average against individual slices of the whole window
Let’s Look at a Real Life Example
Take an average workday of 8 hours. Oddly enough, there are also three 8-hour chunks in a day, so this is a safe window to use as a first draft. This window can be adjusted to better suit the needs of any specific environment. Make it wider for a less sensitive alert; make it narrower for a more sensitive alert. What we need to do is look at that eight-hour span and get a count of DNS events per host, per hour.
Search Part 1: Pulling The Basic Building Blocks
In this first part of our search, we are pulling our basic building blocks, including:
- Hosts making the DNS queries
- Original sourcetype of the DNS events (useful for later drilldown searching)
- Starting timestamp of each hour-window
- DNS server(s) handling the queries
- Total count for that query src within that hour
This will appear as is shown below: