Using AI to Combat AI - Purpose-Built AI Models in NDR

Summary

Threat actors have weaponized Artificial Intelligence (AI) making malware and other advanced persistent threats (APTs) capable of circumventing or hiding from lagging next generation antivirus (NGAV), anti-malware, sandboxing, and other threat detection technologies. AI enabled malicious software can quickly detect the environment they are operating in and take evasive measures to escape detection and removal. The only way to combat weaponized AI is with purpose-built AI models looking for small non-normal or suspicious behavior across magnitudes of activity over large periods of time. Network detection & response (NDR) tools are implementing AI models designed for specific threat hunting use cases to find and remove weaponized AI. This whitepaper provides a basic overview of the AI technologies used within purpose-built AI models in Sangfor’s Cyber Command NDR capability and the primary threat hunting use cases the AI models detect.

Introduction

Cyberattacks have become much more sophisticated in the past several years. Malware has been rearchitected to be modular allowing the swapping of payloads based on an attacker’s requirements. Many of these payloads weaponize AI by using artificial intelligence to quickly identify the surrounding environment allowing for smarter, more targeted attacks. Malware and APTs can easily determine if they are executed in a sandbox and immediately become dormant to evade detection. The AI can identify the domain and location of the host to deploy specific payloads or activate only if the host is inside the correct company or city. For example, the REvil ransomware is designed to deactivate if the host is part of a Russian language domain.

Network detection & response (NDR) tools are being deployed by organizations to provide greater visibility of known and unknown threats across a network and then respond swiftly to them. NDR uses artificial intelligence to uncover hidden threats facilitated by weaponized AI. However, deploying AI for AI’s sake to generally detect threats is not a viable solution as generalized AI is prone to large amounts of false positives and false negatives; you cannot catch all fish using the same bait. The most efficient way to combat weaponized AI is by implementing purpose-built AI models designed to detect specific behaviors. Sangfor’s Cyber Command NDR capability is designed to detect specific threat hunting use cases using AI models designed for each use case. The most common use cases Cyber Command NDR is used for are DGA domain detection, encrypted communications modeling & attack detection, malware & ransomware protection, and user & entity behavior analytics (UEBA).

Basics of AI Technology

To understand the purpose-built AI models used in Cyber Command’s NDR capability, a little background about AI modeling is provided for the reader’s convenience.

The Three Waves of AI

Artificial intelligence, in some form, has been around for decades. But to understand how AI works, there needs to be a definition of the ability to process information that can be called “intelligence” before building an artificial one. The U.S. Defense Advanced Research Projects Agency (DARPA) has long described the evolution of AI technology in three waves. In these waves, intelligence was defined using four metrics:

Perceiving: collecting and identifying rich, complex, and subtle information
Learning: changing understanding and the rules or models within a given environment
Abstracting: to create new meaning
Reasoning: to plan and to decide

Using these metrics, we can determine artificial intelligence’s ability to process information.

DARPA’s three waves shows how AI has evolved in “intelligence”, but technologies based on all three waves are still in common use today.

Figure 1 DARPA Three Waves of AI

Today, cybersecurity tools commonly use AI based on the second wave. These products establish a repository of raw data called a “data lake” and then construct AI analysis based on this data lake. AI models used in NDR can create predictions of behavior of network related threats after learning a large amount of network traffic data. But the models themselves cannot determine “bad” behavior without human interaction.

Machine Learning Models

Cybersecurity relies heavily on machine learning (ML), computer algorithms that improve themselves automatically through experience and using data. ML is a part of artificial intelligence where learning algorithms build a model based on sample data or "training data" to make predictions or decisions without rulesets or being explicitly programmed to do so. Applications using ML include facial recognition, email and spam filtering, medicine, speech recognition, and computer vision, applications where rule-based models would not be very effective due to the irregularity of the data.

The three most widely used machine learning models are unsupervised, supervised, and semi-supervised. Most Cyber Command NDR AI models use supervised or unsupervised learning. Some models will use a combination of both learning models in stages where one stage feeds into the next.

Figure 2 Machine learning models

Cyber Command NDR Purpose-Built AI

Gartner identifies six AI use case groups in their report, 'Emerging Technologies: Emergence Cycle for AI in Security for Malware Detection'¹. These use case groups are:

endpoint
performance monitoring
modeling
encryption
ransomware
code analysis

Sangfor Cyber Command has purpose-built AI models for modeling, encryption, ransomware, and code analysis use cases. Cyber Command also incorporates purpose-built AI models for other use cases Sangfor has identified based on Incident Response activities and customer feedback. These use cases include:

alarm reduction
threat detection
threat hunting of undetected breaches or infections

Endpoint data is correlated in Cyber Command with network traffic data (NDR) and firewall event data (SIEM/log) using the above models to increase confidence in threat, malware, and APT detection, activate automated response policies, and automate threat hunting.

The most common use cases Cyber Command NDR is used for are:

DGA domain detection
encrypted communications modeling & attack detection
encrypted brute force attack detection
malware & ransomware protection
user & entity behavior analytics (UEBA).

DGA Behavior Model

Sangfor’s Engine Zero malware detection engine, in conjunction with Cyber Command, incorporates threat intelligence to extract many types of indicators found in previously analyzed advanced threats. This data inputs into several AI-enabled detection models which identify malware related network anomalies and send alerts.

One such anomaly is domain generated algorithm (DGA) behavior. DGAs are algorithms used by various malware families to regularly generate large numbers of domain names composed of random sequences of characters to send traffic to command-and-control servers. DGA domains are used primarily to:

produce random domain names that the malware can use and quickly switch between to evade detection
exfiltrate data by appending it as third tier of the DGA domain, like a hostname

Weaponized AI in malware and APTs has become widespread with DGA Botnets a typical example. Normally, botnet malware initiates DNS requests to command and control servers using domain names that are static and can be identified by security products using rule-based engines (such as IPS) integrated with threat intelligence. However, attackers have integrated AI algorithms in DGA botnets to generate botnet domain names with random characters in real-time. The detection rate for this type of botnet, using a rule-based model and threat intelligence, is almost 0%.

Figure 3 DGA behavior detection model

By analyzing DGA behavior characteristics using the purpose-built DGA Behavior Model, Cyber Command can quickly recognize abnormal DGA behavior generated by specific hosts and locate the IP addresses of the DGA command and control servers. DGA behavior characteristics include:

thousands of DGA requests resolving to a single or small number of IP addresses
second and third-level domain names are obviously not human readable (random looking character strings)
domain names that cannot be resolved in nslookup or other DNS resolution tools
domain names that are used only one time

Figure 4 DGA domain characteristics

Figure 5 DGA domain behavior mapped to IP address

DGA AI modules use multiple naming techniques to bypass security detection and exfiltrate data. Two variations of DGA Botnet domain naming mechanisms are shown in figure 6.

Figure 6 DGA domain naming variants

DGA Variant 1 generates second level domain names that are unreadable by humans. These domain names have distinct features, such as random combinations of letters and numbers in the domain names.

DGA Variant 2 uses human readable words and may be easier to detect by humans, but they are single use and hidden among thousands of DNS request each day. Cyber Command’s DGA Behavior Model can quickly construct a behavior models of traffic and easily detect these types of domains.

Encrypted Communication Modeling & Attack Detection

After an attacker has entered the internal network and controls a host, they need to communicate with their command-and-control network. Generally, attackers will initiate an encrypted channel for hidden malicious communications disguised as normal traffic. The malicious data is encrypted to bypass firewall security policies, avoid security audit protocols, and go deep into the core network.

Figure 7 Encrypted communication detection

When an attacker compromises a border host to use as a jump point to attack an intranet host, Cyber Command monitors and analyzes both the inbound and outbound traffic of the jump host to determine if the encrypted traffic is malicious. Encrypted Communication Modeling analyzes encrypted traffic packet and byte flow characteristics both incoming and outgoing at the jump server to discover any malicious files being transferred without the need for decryption. Irregular bursts or sustained encrypted traffic is a good indication of large data transfers.

Figure 8 Encrypted traffic analysis

Encrypted Brute Force Attack Detection

Brute force attacks are a common method for initiating network intrusion and executing lateral propagation. Attackers overwhelm a system with a high rate of username and password combinations, sometimes forcing the system or network offline. To evade security detection, attackers often use more covert, distributed, (ultra) low-rate brute-force attacks. It is very difficult for most security devices to detect slow-moving (some as slow as one a week) encrypted brute force attacks.

Sangfor Cyber Command’s innovative Encrypted Brute Force Attack Detection AI model detects encryption protocols, slow brute force attacks, and abnormal login behaviors. A typical workflow includes:

Protocol identification: for encrypted traffic, initiate encrypted login authentication status detection, and then use the slow-speed brute-force identification engine for processing. For non-encrypted protocols, utilize the slow-speed brute-force identification engine.
Encryption protocol: AI analysis of traffic characteristics to determine the login authentication status and determine if the login was legitimate using characteristics like session length, protocol interaction and traffic pattern analysis.
Login protocol: Cyber Command uses a fingerprint-based, abnormal login detection engine to identify abnormal login behavior which might indicate a brute force attack or vulnerability in the system.
Detection Protocol: After confirming the login status, Cyber Command uses the slow-speed brute force detection engine using multi-scale time window serialization to identify very slow brute force behavior.

Figure 9 Protocol analysis engine flow

Malware & Ransomware Protection

AV-Test registers over 350,000 new pieces of malware daily². Next generation anti-virus (NGAV) using rules-based matching and threat intelligence engines have difficulty keeping signature databases updated to detect these new malware. Cyber Command uses purpose-built AI models instead to detect, identify and classify these new malware variants.

Cyber Command’s Malware & Ransomware Protection Model builds a malware model library by learning the behavior, spread, and process characteristics of known malware families. The behavior model of new malware becomes a DNA fingerprint, identifying code and behavior that can be mapped to existing malware families making it easier and faster to identify and classify new malware strains.

Figure 10 Malware detection and classification engine

For example, when Cyber Command encounters ransomware, it dissects the structure of the PE file, learning the byte-level characteristics of the ransomware, including its IP connection, port connection, registry keys accessed and assembly instructions, and uses semantic analysis to build a DNA fingerprint and behavior model. The Malware & Ransomware Protection Model then extracts essential malicious behavior characteristics like network connection tests, file encryption, registry changes, and self-starting while filtering out programs where these characteristics are not found. Simultaneously, Engine Zero’s malware detection AI models are trained for future detection of similar ransomware.

User & Entity Behavior Analytics (UEBA)

When different users exhibit the same aberrant behavior, it is possible that they are infected by the same malware. Cyber Command anomaly detection continuously learns behavior by monitoring indicators such as account login time, login location, frequency, and applications accessed to determine if users or their systems become threats.

Figure 11 User behavior cluster analysis

Figure 12 Behavior prediction model

When abnormal user behavior is found on the network, Cyber Command tracks movement to determine what assets are infected or at risk of infection. Cyber Command can automatically initiate an automated playbook response to isolate Infected assets from the network to stop further spread of malware.

Cyber Command AI Scenarios

In addition to the use cases described above, Cyber Command has additional purpose-built AI models for a wide range of cyber threat scenarios including:

Summary

Malware and APTs are embedding AI specifically into malware that can chose what payload to deploy, whether to become active or dormant, and how to exfiltrate data based on the environment. It is difficult to detect this weaponized AI behavior without AI-based detection tools. Sangfor Cyber Command NDR capabilities use purpose-built AI models to detect AI-based behavior such as DGA domain traffic, attacks hidden within encrypted traffic, malware & APT infection, and UEBA. Using AI models dedicated to specific use cases increases detection while reducing false positives and false negatives.

About Sangfor

Founded in 2000 and a publicly traded company as of 2018 (SANGFOR STOCK CODE: 300454 (CH)). Sangfor Technologies is the global leading vendor of IT infrastructure solutions specializing in Cyber Security & Cloud Computing.

For more information about Sangfor Cyber Command, please visit our website www.sangfor.com or send an email to marketing@sangfor.com.

¹Gartner Inc., Emerging Technologies: Emergence Cycle for AI in Security for Malware Detection, Nat Smith & Rustam Malik, G00735652, 27 October 2020
²https://www.av-test.org/en/statistics/malware/

Source: Sangfor

Return to Home