In order to get the most value from this Splunk data, you may need to install Splunk apps into your Splunk instance. Both the Windows TA and Linux TA will be helpful to have installed to better work with this data. Other data sources will require other apps to have working field extractions.
How the Data is Structured
In the National CPTC events, each team is provided with an identical, dedicated environment. During the 2019 season, these were separate instances in Google Cloud.
Teams compete in two events, regionals (October 12 & 13, 2019) and nationals (November 22-24, 2019). There were six regional events: North Eastern, South Eastern, New England, Central, Western, and International. Each regional event had up to 10 teams. The winning teams from each region (6 total) were invited to compete in the National competition, along with the next 4 highest-ranked teams at large.
Events were exported for the time periods of the regional and national competitions. The following epoch timestamps were used for the data export:
Because of limitations in how this data can be exported when frozen by Splunk, you may see events that occur outside of these time windows in the data. These should be ignored as they are not necessarily complete.
All hostnames have a prefix and team number. The prefix represents the region, and the team number represents an individual team in that region. A few examples of hostnames:
- nationals-t2-vdi-kali01: A Linux VDI instance for Team 2 in the Nationals event
- newengland-t6-vdi-kali01: A Linux VDI instance for Team 6 in the New England regional
- nationals-t1-bank-core-01: The DinoBank core application host for Nationals Team 1
The host field in Splunk is set when the data is collected, based on the region and team number. There is no local indication on the hosts themselves about their region. This means that a host showing up as host=nationals-t1-bank-core-01 in Splunk is actually running with the host name on the local operating system set to bank-core-01.
This naming structure allows you to search the data in various ways, depending on if you are looking for everything from a certain region, team, or type of host. Here are some sample searches you may want to try:
- host=nationals-t* – data from all teams that participated in Nationals
- host=nationals-t1-* – data from all of the hosts for Nationals Team 1 (note the trailing dash, t1* would include both Team 1 and Team 10)
- host=*kali0* – data from all the Kali VDI instances for all teams (note, Splunk searches with a leading * are not very efficient, so avoid them if possible)
You may also note other hosts that do not follow these naming conventions. Many of these represent build and test environments which were not used by students. The following prefixes are the only ones that will contain student competitor data:
Regional host prefixes:
National event prefixes:
Teams numbers range from 0-10. For regionals, not all regions have 10 teams.
What We Collected
We attempted to collect data from as many locations as possible across all of the student systems (both VDI/sources and targets) throughout the environment. The types of data available will vary based on the operating systems and roles of each host.
The following logs are generally available for both Windows and Linux hosts:
- IDS events (ids index): suricata:http, suricata:stats, suricata:alert, suricata:tls, suricata:ssh
- Splunk App for Stream events (stream index): stream:dns, stream:udp, stream:tcp, stream:http
Nearly all data types available for collection by the Splunk Add-on for Microsoft Windows (https://splunkbase.splunk.com/app/742/) were collected, with intervals increased to provide an increased sampling rate. All of the Windows specific indexes are prefixed with index=win*, and Windows Event Log data sources log to index=winevent*.
The Windows data available generally includes the following:
- Windows Perfmon data
- Windows Event Log (Security, System, Application, etc.)
- Windows Sysmon
- Powershell Transcript Logs
Nearly all data types available for collection by the Splunk Add-on for Unix and Linux were collected, with intervals increased to provide an increased sampling rate. All of the Linux specific indexes are prefixed with index=linux*.
The Linux data available generally includes the following:
- Bash history
- Common diagnostic tools: df, ps, top, lsof, netstat, vmstat, who
- Open ports
- Network and interface information
- Package information
- Contents of the /etc and /tmp directories
Sourcetypes by Index
The following sourcetypes are available in each of the indexes available for download in this dataset:
||suricata:alert suricata:http suricata:smtp suricata:ssh suricata:stats suricata:tls
||stream:dns stream:http stream:tcp stream:udp
||MSAD:NT6:Health MSAD:NT6:Replication MSAD:NT6:SiteInfo Powershell:ScriptExecutionErrorRecord Powershell:ScriptExecutionSummary
||PerfmonMk:CPU PerfmonMk:DFS_Replicated_Folders PerfmonMk:DNS PerfmonMk:LogicalDisk PerfmonMk:Memory PerfmonMk:NTDS PerfmonMk:Network PerfmonMk:Network_Interface PerfmonMk:PhysicalDisk PerfmonMk:Process PerfmonMk:Processor PerfmonMk:ProcessorInformation PerfmonMk:System
Questions and Feedback
Please reach out to the CPTC research distribution list (research nationalcptc.org) for further information about this dataset.
This dataset is being made freely available to support various educational and research initiatives. While you are free to use this data for your own purposes, we ask that this dataset be attributed to the National Collegiate Penetration Testing Competition (National CPTC) in any publications or references.