Your Step-By-Step Guide for Splunking Data in Amazon S3

By Tom Kopchak|Published On: February 5th, 2020|

I recently worked with a client who had some log files in Amazon Web Services (AWS) S3 that they wanted to ingest into Splunk. This seemed like a great opportunity to build an example in our lab and document the process for those of you who might be interested in doing the same thing.

AWS Configuration

For this example, I am going to start by creating a new S3 bucket and uploading some data. You can skip these steps if you have an S3 bucket already and move directly to the section on configuring an IAM user and permissions.

I’ve created a video walking through the process if you prefer this approach:

Otherwise, follow along in the steps below.

Creating some sample data

To make a log file, use a one-line bash script as follows:

Copy to Clipboard

I would expect any logs you might ingest to be more useful than these.

Creating an S3 bucket

In the AWS console, search for S3 in the services menu:

Then, click Create bucket.

Provide a Bucket name and select a Region. In this example, I’m using hl-s3-demo as the bucket name, and using the US-East-2 (Ohio) region.

I’ve left all of the options on the next screen at their defaults.

Now it’s time to set permissions. It’s best practice to block public access to S3 buckets, so we’ll keep that as default (we’ll be creating an IAM user that has permissions to read this data later).

Finally, review your settings; if everything looks good, click Create bucket.

Adding some data

Our new bucket doesn’t have any data yet since it was just created. It would be a pretty terrible demo without anything to pull into Splunk, so I’m going to upload the sample file we created earlier to this bucket.

Start by searching for your bucket name and clicking on it.

You’ll see that your bucket is empty. Click the Upload button to add some data.

Select the file we created:

You can keep the other settings at defaults. I am using standard as the storage class for this example.

At this point, the data we want to index is in S3. Next, we’ll create a user with permissions to access this data.

Creating an IAM User

We now need to create a user that has permissions to read the log data we have stored in S3. This is done through creating an IAM user.

Start by searching forIAM in the AWS console:

Then, click Add user to start the process for creating a user.

Provide a name for your user. In this example, I will use s3-demo. Make sure you track this username, as it will be configured in Splunk later. Additionally, you’ll want to grant this user Programmatic access, which allows the user to access the S3 API.

Now it’s time to set permissions on the user. I want this user to have limited access to only read log files in S3. To do this, I’m going to apply the AmazonS3ReadOnlyAccess AWS managed policy to this user.

I did not add any tags to the user in the next screen. After setting any tags, you will have the opportunity to review your user settings to confirm they are correct.

Finally, you’ll be provided with the Access key ID and Secret access key for this new user. You’ll want to record both of these since they need to be inputted into Splunk and you won’t be able to retrieve this information later.

At this point, se have configured everything we need in AWS. Now, we’ll set up Splunk to read this data.

Splunk Configuration

To collect data from an S3 bucket, we’ll first need to install the Splunk Add-on for Amazon Web Services. This generally should be installed on a Heavy Forwarder or an IDM in Splunk Cloud.

I’ve created a part 2 video walking through the Splunk configuration here:

If you prefer to follow written instructions, follow along below.

For this example, I uploaded the app through the Splunk WebUI on my heavy forwarder, restarting Splunk once the app was installed.

Once the app is installed, you’ll want to launch the app and configure the IAM user.

The accounts in the AWS Add on are configured under Configuration -> Account in the app. Click Add to add a new account.

In the Add Account screen, provide the name of the account, Key ID, and Secret Key for the IAM user you created above.

Once this account is added, you’ll see it in the list of accounts:

Now, the account is ready to use to pull data.

Configure an S3 Input

There are two ways to configure an S3 input: through the AWS Add-on or through the Data inputs menu in settings. For this example, we will navigate to Settings -> Data -> Data inputs.

In the list of inputs, locate AWS S3 and click on it.

Assuming this is your first S3 input, you’ll see that there are no other configurations of this type. Click New to create a new input.

Configure the settings for your input. For this example, I’ve filled in the Name, AWS Account (matching the account specified in the AWS Add-on), and Bucket Name (matching the name of the bucket in S3), as well as the Index where I want the data to be stored. Everything else was left at the default.

Once this input is created, wait a few moments, and run a search to test it! As you can see, our data is indexed directly from the S3 bucket.

Wrap up

Hopefully this example will lead you in the right direction when it comes to indexing data stored in AWS S3. If there are other topics you’d like to see covered, feel free to reach out to me @tomkopchak and I will get an example published!

About Hurricane Labs

Hurricane Labs is a dynamic Managed Services Provider that unlocks the potential of Splunk and security for diverse enterprises across the United States. With a dedicated, Splunk-focused team and an emphasis on humanity and collaboration, we provide the skills, resources, and results to help make our customers’ lives easier.

For more information, visit www.hurricanelabs.com and follow us on Twitter @hurricanelabs.