Combining AWS and Splunk for Sentiment Analysis

By Hurricane Labs|Published On: February 12th, 2018|

I have a few things in life I really enjoy and AWS is taking them away from me. I enjoy writing code to do sentiment analysis, but I HATE training the models. So, enter Amazon Comprehend, which is (one of) AWS’ many machine learning voodoo things. You toss some text at it, it groks the text, and spits out a score broken down by neutral, positive, or negative ratings.

Now, we wanted to do some basic sentiment analysis on the last message in a ticket. Why? Great question. We wanted to see if a customer was overly negative about the ticket and, frankly, Tom Kopchak told me to do it so I did it. I also wanted to learn more about AWS Lambda and Splunk’s HTTP Event Collector. This project combined all these things, so I was completely on board. This isn’t meant to be a tutorial of any one of these things, each vendor has their own (better) tutorials available and you’re welcome to check those out.

Here we go.

Our tickets exist in Zendesk and they have a pretty handy API, so we could get what we needed easily and into AWS’ S3 for processing later. Now I could’ve just streamed one to the other, but I will likely do some other things with this data that will require S3 so that’s why I chose that route.

S3 lets you define a lifecycle policy, so this data is pretty transient – it will pretty much die after it is created. Once it hits S3 we get an “object created” event that calls our AWS Lambda function, this is called, cleverly, a “trigger”. This tells our Lambda function what S3 bucket and “key” (really filename) to use for this call. The Lambda function will then call AWS’ Comprehend and pull back the results of the analysis (takes seconds) and then fling those results over to our Splunk HTTP Event Collector where I built a really fancy dashboard to show the overall sentiment analysis.

Combining AWS and Splunk for Sentiment Analysis

The code is available here, but let’s take a closer look at what’s going on.

After we do our imports, we load up our necessary Boto 3 clients. Boto 3 is one of the things that Lambda just provides, so you don’t have to do anything special you wouldn’t do in Python anyway.

Copy to Clipboard

The next section parses out the data from our S3 trigger and populates our bucket and key variables.

Copy to Clipboard

Here we grab our AWS Lambda Environment Variables, which allow us to encrypt our sensitive configuration information. Here that’s just our Splunk server information and our HEC token.

Copy to Clipboard

This line grabs the S3 object for us and loads it into the response object so we can operate on it.

Copy to Clipboard

Once we have that object, we only need to grab the body of it, since we want to analyze the whole thing and not really parse it in any way.

Copy to Clipboard

This line, simply, sends the body of the ticket we grabbed above and loads the response into the sentiments dictionary. It’s a dictionary because Python says so, don’t argue.

Copy to Clipboard

Now because I didn’t do any parsing and my S3 “key” is just the ticket number, it was pretty trivial, without any parsing to add a key (the ticket number) to the sentiments dictionary so that I had something unique to identify the analysis with.

Copy to Clipboard

At this point all we need to do is hurl the complete analysis results over to our Splunk HTTP Event Collector, which lets us do some analysis and build fancy dashboards. Please note: I made “host” be “lambda-” and then the name of the function. This isn’t a requirement by any stretch, but it makes it easier to identify in Splunk that this is where the data came from. I could’ve also done this in Source, but I didn’t so whatever you’re more comfortable with is fine with me – I don’t judge.

Copy to Clipboard

The raw data in Splunk looks something like this, with ticket numbers changed to protect the innocent:

After I built an ugly dashboard, Ian (our Dashboard Overlord) made it better and less “Bill like” so it looks pretty good now:

Now keep in mind, this isn’t data we have to keep “fresh”, that is all automated and this is a fairly generic function so it could be used with any type of data, not just tickets. The hardest part for me was figuring out all the Lambda nomenclature, the code is very straightforward and simple. This allowed me to combine three of my favorite things: AWS anything, natural language processing (even though I didn’t have to actually write any of that), and last but never least Splunk. It also allowed me to learn some new stuff about my favorite things that I will be able to apply to other more broad projects. Now that’s what I call a worthwhile couple of hours of work.

Enjoy!

About Hurricane Labs

Hurricane Labs is a dynamic Managed Services Provider that unlocks the potential of Splunk and security for diverse enterprises across the United States. With a dedicated, Splunk-focused team and an emphasis on humanity and collaboration, we provide the skills, resources, and results to help make our customers’ lives easier.

For more information, visit www.hurricanelabs.com and follow us on Twitter @hurricanelabs.