What the HEC Is This Thing?
The Splunk HTTP Event Collector (HEC) is a great mechanism for receiving streaming data from a variety of sources where it may not be practical to use another collection mechanism, such as monitoring a log file. These include many types of cloud services and applications, as well as custom applications that can do logging via a web POST request. In some cases, you may have the option of using HEC or an API pull on a heavy forwarder to collect data, such as for Amazon Web Services (AWS). Generally, if HEC is an available option, it is the best one to use.
I’ve covered some of the benefits of using HEC near the end of my 2019 Splunk.conf talk, Administrators Anonymous: Splunk Best Practices and Useful Tricks I Learned the Hard Way, available here for your viewing pleasure.
Using HEC is advantageous to your data ingestion experience because using a streaming mechanism will be better than using a polling method. This is for a couple of different reasons. A streaming mechanism:
- Provides close to real-time ingestion: Streaming data can be received by Splunk as soon as it is created, which makes it available for searching much faster. A polling method runs on a scheduled interval, checking for new data in a queue on a regular basis. This means there will always be at least some delay in getting data, depending on how often the polling happens.
- Avoids API limits: In order to reduce the delay associated with an API pull, it may be tempting to increase the frequency of these checks to get the data faster. However, many APIs have or may enforce rate limits if they are accessed too frequently. Streaming data doesn’t rely on this mechanism, so any API rate limits don’t apply in the same way.
- Improves load balancing and redundancy: There’s not a great mechanism to load balance or allow for failover when using a heavy forwarder for data collection. HEC, on the other hand, can use an external load balancing mechanism to distribute incoming data to all of your indexers.
- Improves reliability: Many vendors force API tokens to expire on a recurring basis, some as frequently as every 90 days. When this expires, your data ingestion will break. Streaming data removes the need to constantly keep an API token up to date.
By this point, I’m sure you’re thrilled to get started with using HEC and want to see how it’s done. I’ve put together some instructions and also a few video demos of the process on a lab instance to show you how it works.
See the video below for a walkthrough of setting up HEC:
To set up HEC on a single Splunk instance:
Navigate to Settings -> Data Inputs: