Understanding Data

Author: Roger C.B. Johnsen

Introduction

A threat hunter makes a living from understanding data. All kind of data. But what does it mean? Understanding data means grasping the context, meaning, structure, and insights within a dataset. It involves knowing where the data comes from and what it represents, such as financial transactions, network logs, or customer feedback. Recognizing different data types, like numbers, text, or timestamps, and understanding how to work with them is crucial. It’s also about identifying patterns, trends, or anomalies in the data and assessing its quality to ensure it’s complete, accurate, and consistent. Additionally, understanding data involves making sense of the relationships between different variables and how they connect to real-world phenomena, like detecting a potential security breach from a spike in network traffic. Finally, it’s about extracting useful insights, making inferences, and applying the right analytical methods to interpret the data effectively. In our field, this would mean understanding how to interpret security logs, threat intelligence feeds, and other data sources to spot threats, trends, and anomalies in network behavior.

Strategies for understanding data

Searching

Before be start on the analytical tecniques, we need to discuss the human aspect of understanding data. I will start this section by talking about searching. Searching is bread and butter for every threat hunter out there. It’s simple - we search in order to find. However, jumping into action without a plan or direction is a waste of time. I have seen many eager people jumping into action without a proper plan. They usually end up spending much time searching with no results, often having to revert back and start over again using a strategy. And .. yes, often they encounter a log that they’ve have not fully grasped yet. Or said in a tongue in cheek way:

You don’t simply query into Mordor — Roger Johnsen

In order to prepare for any unknown situation (going into Mordor), I think a threat hunter should take advantage of the OODA loop. The OODA loop was developed by military strategist John Boyd. It stands for Observe, Orient, Decide, and Act. It can be effectively applied to the process of understanding data, helping us systematically navigate the log source(s).

Observe

When faced with something “unknown” that we need to understand, we should really start observing “it”. In this phase, the focus is on gathering information from the unknown situation. Here are some tips:

Step	Description
Identifying the Log Source	Determine the type of logs (e.g., firewall, application, system) and where they are coming from. Understanding the source helps establish the context for analysis.
Data Collection	Extract the logs from the source, ensuring you have sufficient data for analysis. This might involve pulling in large volumes of log data for a specific timeframe.
Initial Examination	Look for any immediate indicators, such as errors, warnings, or unusual entries, that might require further investigation.

Orient

Once you have gathered the data, the next step is to analyze and interpret / making sense of the information:

Step	Description
Contextual Analysis	Understand the environment from which the logs are derived. Familiarize yourself with normal behavior patterns and expected log entries.
Baseline Comparison	If available, compare the observed logs against known baselines to identify anomalies or deviations from normal behavior.
Categorization	Classify the log entries based on their types and severity to determine which areas may need deeper investigation.

Decide

After analyzing the data, you will need to make decisions on the next steps:

Step	Description
Identifying Priorities	Determine which anomalies or events warrant immediate attention based on their potential impact.
Formulating Hypotheses	Based on your observations and analysis, develop hypotheses about potential security incidents or issues that may be present.
Strategizing Next Steps	Decide whether to conduct deeper investigations on specific entries, gather more logs, or look into other sources of data for correlation.

Act

It is time to finally act. This means take action!

Step	Description
Conduct In-Depth Analysis	Delve deeper into specific log entries or related logs to confirm or refute your hypotheses.
Documentation	Document findings, actions taken, and any correlations made during the investigation for future reference and reporting.
Response Actions	Depending on the outcome of your investigation, implement necessary response actions. This may involve alerting relevant teams, mitigating threats, or implementing changes to improve security posture.
Feedback Loop	After taking action, reflect on the process and results. Consider what was learned from the investigation to improve future OODA cycles, particularly in dealing with unknown log sources.

By applying the OODA loop to searching in an unknown log source or situation, we can maintain a structured approach that enhances their efficiency and effectiveness in identifying potential security incidents. Oh - you might have to restart the OODA loop to fully grasp something. That’s OK! The OOOA loop is designed for just that!

Clustering Data

We’ll leave the human aspect for now. But before we do that, keep in mind to always carry the OODA loop with you. It sure is handy if you apply it right. Carrying on with some statistical approach to understanding data:

Data clustering in the context of SOC (Security Operations Center) and threat hunting refers to the process of manually grouping similar types of security events, logs, or incidents based on shared characteristics or patterns. The goal is to identify related activities, isolate abnormal behavior, and detect potential threats more efficiently.

Key Concepts in SOC Clustering

Concept	Description
Event Grouping	Analysts manually sort and categorize similar security events (e.g., login attempts, network traffic spikes) to identify trends or anomalies.
Log Analysis	Clustering helps to organize log entries from firewalls, IDS/IPS, or endpoint security tools, grouping them based on similar IP addresses, timestamps, or activity types.
Pattern Recognition	By clustering events that share common characteristics, analysts can detect potential attacks or lateral movement within the network.
Reducing Noise	Grouping redundant or benign events can reduce alert fatigue, allowing analysts to focus on clusters that stand out and require deeper investigation

Examples of Manual Clustering

Example	Description
IP or Domain Grouping	Grouping traffic or alerts related to specific IP addresses, domains, or subnets to investigate potentially malicious communication.
Time-Based Clustering	Clustering incidents or events occurring within the same time window to look for coordinated attack patterns (e.g., a brute-force attempt followed by privilege escalation).
User Behavior Clustering	Grouping activities by users, especially when investigating insider threats or compromised accounts, to determine if there’s any unusual behavior.

In SOC and threat hunting, clustering helps analysts efficiently manage large volumes of data and prioritize their investigations, making it easier to spot indicators of compromise or suspicious activity.

Grouping Data

Grouping data in the context of threat hunting refers to the process of taking multiple unique artifacts and identifying when multiple instances of them appear together based on specific criteria. This technique is essential for efficiently analyzing security data and identifying potential threats or anomalies.

Key Aspects of Grouping Data in Threat Hunting

Aspect	Description
Definition and Purpose:	Grouping involves categorizing unique artifacts (e.g., IP addresses, user accounts, event types) to see when multiple items of interest appear together. This can help analysts detect patterns indicative of malicious behavior.
Explicit Input	Unlike clustering, where the algorithm determines groupings based on similarities without predefined categories, grouping uses an explicit set of items that are already known to be of interest. Analysts define the items they want to track and analyze.
Identifying Tools and TTPs	If a particular group of artifacts appears out of place or unusual, it may represent a tool or TTP (Tactics, Techniques, and Procedures) that an attacker is using. This can be critical for identifying ongoing attacks or breaches.
Criteria for Grouping	An important aspect of grouping is determining the specific criteria for identifying related instances. This might include: Time periods (grouping events that occurred within a certain timeframe). Event types (Grouping related events, such as failed login attempts followed by a successful login). Source or destination (Grouping based on common source or destination IP addresses.)
Hunting for Related Instances	Grouping works best when analysts are hunting for multiple, related instances of unique artifacts. For example, if multiple failed login attempts from the same user account occur within a short time frame, this may warrant further investigation.
Anomaly Detection	By observing groups of artifacts that deviate from the norm, analysts can more easily identify potential threats. For instance, if several user accounts are accessed from unusual locations at the same time, it may indicate a coordinated attack.
Enhanced Reporting and Visualization	Grouping data allows for better reporting and visualization. Security analysts can create dashboards that showcase grouped metrics, making it easier to communicate findings and respond effectively.

An analyst might group data from a SIEM based on:

Time Period: Identifying all login attempts within a specific hour that come from unusual geographic locations.
User Accounts: Tracking all accounts that experience multiple failed login attempts in a short timeframe.
Event Types: Grouping alerts from different systems (e.g., firewall and intrusion detection system) that indicate a potential breach attempt.
Other: By other means than depicted here that fits the scenario.

Grouping data is a vital technique in threat hunting that enhances the ability to analyze security events, detect anomalies, and prioritize responses. By identifying when multiple unique artifacts appear together based on specific criteria, security analysts can uncover patterns that may indicate malicious activity. This method is particularly useful for tracking related instances and understanding potential threats within an organization’s environment.

Stack Counting

Stack counting in the context of threat hunting refers to the practice of systematically accumulating and organizing counts of various security-related events or indicators over time. This method allows analysts to identify patterns, anomalies, and potential threats within a network or system by visually stacking counts of different types of events or alerts. Here’s a breakdown of its significance and usage in threat hunting:

Key Aspects of Stack Counting in Threat Hunting

Aspect	Description
Data Aggregation	Stack counting involves aggregating data from various sources (e.g., logs, alerts, user activities) to create a comprehensive view of events occurring within an environment.
Visualization	By using visual representations (like bar charts or histograms), stack counting helps security analysts quickly identify trends, spikes, or unusual patterns in data. This visualization makes it easier to spot anomalies that could indicate malicious activity.
Comparison of Event Types	Analysts can compare the counts of different types of events (e.g., successful logins, failed login attempts, firewall alerts) to understand normal versus abnormal behavior. For instance, a sudden increase in failed logins could indicate a brute force attack.
Temporal Analysis	Stack counting can be used to track event counts over time, enabling analysts to see how certain types of events change, which may correlate with specific activities or incidents.
Incident Response	Identifying anomalies through stack counting allows for quicker incident response. For example, if there is an unexpected increase in traffic from a specific IP address, analysts can investigate further for potential malicious intent.
Prioritization	By identifying the most frequently occurring threats or events, security teams can prioritize their investigations and resources towards the most critical issues.
Baseline Establishment	Establishing a baseline of normal activity using stack counting can help in detecting deviations that may indicate security incidents.

An analyst might use stack counting to visualize the number of alerts generated by various detection rules in a security information and event management (SIEM) system over a week. By stacking the counts, the analyst can easily identify which alerts are most common, which might require deeper investigation, and which may represent a coordinated attack.

An another example I have found extremely handy is just to count occurrence of “things”. For instance, count browser user-agents to find most used user-agents and from there find outliers, like so:

xychart-beta
    title "User-Agents"
    x-axis [Chrome, Firefox, Edge, Opera, "Py Requests", "Py Urllib", Scrapy, "curl/wget"]
    y-axis "Occurrence" 0 --> 200
    bar [150, 90, 80, 60, 30, 20, 15, 10, 5]

Some may find counting using tables easier, like so:

User-Agent Type	Count
Chrome	150
Firefox	90
Safari	80
Edge	60
Opera	30
Python Requests	20
Python urllib	15
Scrapy	10
curl/wget	5
Total Legitimate	410
Total Malicious	50

In summary, stack counting is a valuable technique in threat hunting that enhances visibility into potential security incidents, aiding in the identification, prioritization, and response to threats. By the way, grouping is extremely good fit for dashboards in your SIEM/SOAR!

Baselining

Baselining can be viewed as the noble art of finding and documenting what is considered legitimate behavior or usage within a network or system. This intricate process begins with the thorough observation and analysis of various activities across the digital landscape, enabling us to identify the patterns and behaviors that define normalcy for the organization. By collecting and analyzing data from logs, user activities, and network traffic, we can establish a clear understanding of typical operational behavior.

Once this baseline of legitimate behavior is documented, it serves as a crucial reference point for detecting anomalies. Any deviations from this established norm can raise red flags, indicating potential security threats such as unauthorized access, malicious activities, or system misconfigurations. However, baselining is not merely about creating a static snapshot; it requires continuous adaptation and refinement as the organizational environment evolves. Factors such as changes in user roles, the introduction of new technologies, or variations in operational demands can all influence what is deemed “normal.”

Moreover, understanding the context surrounding user behavior is essential for effective baselining. For example, an unusual spike in network activity during a corporate event might be legitimate, while a similar spike during off-hours could warrant further investigation. By maintaining an up-to-date and contextualized baseline, threat hunters can enhance their ability to identify genuine threats while minimizing false positives, ultimately fostering a proactive security posture that protects the organization’s critical assets.

HOWEVER, baselining is a kind of unicorn — it is rare, extremely rare, to find someone who has fully baselined and documented a system or network. Instead, what I have found are various attempts at dashboards trying to explain what’s going on. I have chosen to call such dashboards baselining. They don’t tell the entire truth but offer a glimpse of the truth, and that’s better than nothing.

General tips and tricks

Over the years I have amassed some thoughts regarding interpreting logs and data. Here’s a few thoughts from me on understading data:

Aspect	Description
Volume	Always look for volume. The size of data can give an indication of a situation. Example: large data transfers could mean exfiltration.
Count	Counting is basics. Simply knowing the count of things can point in a direction. Example: Many failed logins may suggest brute force.
Min	Hackers loves being stealthy. Take a look at what occurs seldom or the tiniest size of something. Example: Tiny network packets might signal suspicious activity.
Max	Maximum values can indicate interesting behavior. Example: Many file access entries in log could mean unauthorized access.
Sum	The sum of it all. Example: high login totals after hours may hint at insider threats.
Percentage	Percentage can be used to compare between systems, or depict a baseling - or outliers. Example: A spike in failed logins can signal an attack.
Combine them all	All of these tips can be successfully combined to form a narrative - keep that in mind when you twist and turn data to understand it

Resources

Revision

Revised Date	Comment
06.10.2024	Improved formatting and wording