How to Improve Your Data Model Acceleration in Splunk

By Tom Kopchak|Published On: May 9th, 2022|

Data Model Acceleration (DMA) is critical to proper alerting in the Splunk Enterprise Security Suite. This tutorial will walk you through the process of auditing your DMA searches so they’re running as efficiently as possible.

Why DMA?

Splunk uses Data Model Acceleration (DMA) to allow searches to run faster than they would against the raw data. This is important for products such as Splunk Enterprise Security (ES), which rely on constantly running searches across significant volumes of data in order to identify anomalies or security-actionable events. Correlation searches within ES will typically run against accelerated data models in order to return results quickly.

Why is performance important?

In a Splunk ES environment, there are searches running constantly. Data model summary searches will run continuously to build the data models, and correlation searches will run on a regular schedule (as often as every 5 minutes) to minimize the amount of time between when an event occurs and when an alert fires.

If DMA summarization falls behind, it can result in missed security alerts. This is because correlation searches will be running against a data model that doesn’t yet contain data for the timeframe where the correlation search is running.

Improving your DMA search efficiency

Here are four ways you can streamline your environment to improve your DMA search efficiency.

1. Identifying data model status

To check the status of your accelerated data models, navigate to Settings -> Data models on your ES search head:

Need a Hand? Hurricane Labs is Here to Help?

You’ll be greeted with a list of data models. The ones with the lightning bolt icon highlighted in yellow are the ones that are accelerated.

You can click the > to expand the section of each data model and view the status. In this example, we’ll see that the Authentication data model is 100% completed and up to date.

If you see an accelerated data model with a size of 0 and an updated date of 1/1/1970, it typically means that there is no data in your Splunk environment that matches the constraint defined for the datamodel. These are good candidates for disabling acceleration, as it’s a waste of resources to have Splunk constantly trying to find data that simply doesn’t exist (or, if you do expect that data to exist, fix an ingestion issue).

2. Configuring constraints

The constraints for data model acceleration are typically controlled at the index level. These settings are available within the Splunk ES app by navigating to Configure -> CIM Setup in the app.

For each data model in the left panel, a list of indexes configured as restrictions will be shown. If you see “no restriction,” it means that the acceleration search could be made more efficient by adding indexes to the constraints.

3. Identifying relevant indexes

The best time to identify where to apply DMA index constraints is when the data is first onboarded. However, if you’re reading this, you’re probably past that point and dealing with a Splunk environment that already has data, and now you’re trying to make these searches more efficient. Here are a few steps to identity indexes for a data model and configure the settings in the CIM setup screen:

Reference the documentation for each data model. Data model acceleration works on the tags defined in the data model documentation. For example, the Network Traffic data model uses the network and communicate tags.
Run searches to identify data that has these tags set.
Identify the index containing data with these tags.
Configure the indexes’ whitelist to use the indexes that are identified.

4. Making this process easier

Reviewing the settings for each datamodel in the Splunk UI can be a time-consuming task. To make this easier, we’ve developed a datamodel acceleration audit search which you can run in your environment to identify opportunities for improvement. Thanks to Ted Waddell and Cameron Schmidt from our team for their work to develop this search.

Copy to Clipboard

| rest /servicesNS/-/-/admin/macros splunk_server=local 
| search title=Cim_*_indexes 
| table title definition 
| rex field=title "cim_(?<datamodel>\w+)_indexes" 
| rename title AS macro 
| join type=outer datamodel 
    [| rest /servicesNS/nobody/-/datamodel/model splunk_server=local 
    | table title acceleration 
    | rex field=acceleration "\"enabled\":(?<acceleration_enabled>[^,\"]+)" 
    | rex field=acceleration "\"earliest_time\":\"(?<acceleration_earliest>[^\"]+)" 
    | fillnull acceleration_earliest value="N/A" 
    | rename title AS datamodel 
    | fields - acceleration] 
| join type=outer datamodel 
    [| `datamodel("Splunk_Audit", "Datamodel_Acceleration")` 
    | `drop_dm_object_name("Datamodel_Acceleration")` 
    | eval "size(MB)"=round(size/1048576,1), "retention(days)"=if(retention==0,"unlimited",round(retention/86400,1)), "complete(%)"=round(complete*100,1), "runDuration(s)"=round(runDuration,1) 
    | sort 100 + datamodel 
    | table datamodel,complete(%),size(MB),access_time 
    | eval datamodel=if(datamodel="Endpoint.Filesystem","Endpoint",datamodel)] 
| join type=outer datamodel 
    [| rest splunk_server=local /servicesNS/-/-/configs/conf-savedsearches 
    | search action.correlationsearch.label=* 
    | rename action.correlationsearch.label AS rule_name 
    | fields + title,rule_name,dispatch.earliest_time,dispatch.latest_time 
    | join type=outer title 
        [| rest splunk_server=local /servicesNS/-/-/configs/conf-savedsearches 
        | fields + title,search,disabled] 
    | rex max_match=0 field=search "datamodel\W{1,2}(?<datamodel>\w+)" 
    | rex max_match=0 field=search "tstats.*?from datamodel=(?<datamodel>\w+)" 
    | eval datamodel2=case(match(search, "src_dest_tstats"), mvappend("Network_Traffic", "Intrusion_Detection", "Web"), match(search, "(access_tracker|inactive_account_usage)"), "Authentication", match(search, "malware_operations_tracker"), "Malware", match(search, "(primary_functions|listeningports|localprocesses|services)_tracker"), "Application_State", match(search, "useraccounts_tracker"), "Compute_Inventory") 
    | eval datamodel=mvappend(datamodel, datamodel2) 
    | search datamodel=* 
    | mvexpand datamodel 
    | eval uses_tstats=if(match(search, ".*tstats.*"), "yes", "no") 
    | eval enabled=if(disabled==0, "Yes", "No") 
    | search enabled=yes 
    | stats count(rule_name) as correlation_searches_enabled by datamodel 
    | fillnull correlation_searches_enabled value="0"] 
| fieldformat access_time=strftime(access_time, "%m/%d/%Y %H:%M:%S") 
| table datamodel acceleration_enabled, acceleration_earliest, macro, definition, complete(%), size(MB), correlation_searches_enabled, access_time 
| sort -acceleration_enabled definition -complete(%) size(MB)

This search will return a table showing:

the datamodels in your environment,
if they have acceleration enabled,
the index constraints defined,
the build status, and
the number of correlation searches enabled.

Using this output, you can quickly identify the datamodels that might need specific indexes defined or that might be candidates for disabling acceleration.

Best practices

To wrap things up, here are a few general rules and best practices for the best performance:

Every accelerated data model should have specific indexes defined.
Only enable acceleration for data models that are applicable for your environment. If you don’t have data sources for a specific data model, disable acceleration.
Consider disabling acceleration for data models that are not powering correlation searches, especially if you’re not planning to use this data for security use cases in the future.
Regularly review the data in your Splunk environment and update the index constraints as new data sources are added. Include updating these constraints as part of your data onboarding processes.

Hopefully this can help your ES search head run the best it can while being as efficient as possible with managing resources.

About Hurricane Labs

Hurricane Labs is a dynamic Managed Services Provider that unlocks the potential of Splunk and security for diverse enterprises across the United States. With a dedicated, Splunk-focused team and an emphasis on humanity and collaboration, we provide the skills, resources, and results to help make our customers’ lives easier.

For more information, visit www.hurricanelabs.com and follow us on Twitter @hurricanelabs.