Overview of MITRE ATT&CK Tactic : TA0009 - Collection

Introduction

During the lifecycle of an attack, gaining access to a target system or environment is only part of the adversary’s objective. To advance their goals, adversaries must gather and consolidate useful data, such as credentials, proprietary documents, or user interactions, before exfiltrating it or using it for further malicious operations.
The Collection tactic (TA0009) in the MITRE ATT&CK framework encompasses the techniques adversaries use to locate, collect, and stage this valuable information within a compromised environment.

This article explores the core techniques and sub-techniques under the Collection tactic, detailing how attackers leverage these methods in practice and how organizations may detect or mitigate them effectively.

Techniques and Sub-Techniques

T1557 – Adversary-in-the-Middle (AiTM)

Adversaries use Man-in-the-Middle (MITM) techniques to intercept or manipulate communications between two parties without their knowledge. By positioning themselves between the victim and the destination, attackers can eavesdrop, modify data, hijack sessions, or inject malicious content. This enables attackers to capture sensitive information or disrupt normal operations while remaining undetected.

T1557.001 – LLMMR/NBT-NS Poisoning:
Attackers manipulate local network protocols like Link-Local Multicast Name Resolution (LLMNR) or NetBIOS Name Service (NBT-NS) to redirect network traffic, often capturing authentication credentials or other sensitive communications.

T1557.002 – ARP Cache Poisoning:
By poisoning the Address Resolution Protocol (ARP) caches of hosts on a network, attackers redirect traffic through their systems, enabling data interception or modification.

T1557.003 – DHCP Spoofing:
An attacker poses as a DHCP server and provides incorrect network configuration to redirect or intercept traffic from clients.

T1557.004 – Evil Twin:
Attackers set up fake access points mimicking legitimate Wi-Fi networks to trick users into connecting. This enables interception of all network traffic from connected clients.

T1560 – Archive Collected Data

This technique refers to adversaries compressing or packaging gathered information into archive files for easier storage, transfer, or concealment ahead of exfiltration. By archiving data, attackers can reduce file size, encrypt contents, and bundle multiple files into a single container, making detection and analysis more difficult.

T1560.001 – Archive via Utility
Using a common system or third-party compression utilities (like zip, tar, or 7zip) to archive data. These tools are typically available on most operating systems and allow attackers to quickly bundle data for exfiltration.

T1560.002 – Archive via Library
Leveraging programming libraries (for example, Python’s zipfile or tarfile modules) in custom malware or scripts to programmatically create archive files without relying on external utilities.

T1560.003 – Archive via Custom Method
Employing proprietary or less-common custom compression or packaging methods—possibly designed to evade signature-based detection or to implement specific encryption schemes tailored to the attacker’s needs.

T1123 – Audio Capture

It refers to techniques adversaries use to surreptitiously record audio from compromised devices. This often involves exploiting malware or remote access trojans (RATs) that activate microphones to capture conversations sounds without the user’s knowledge. It is a form of data collection targeting audio input to gather sensitive information.

T1119 – Automated Collection

This technique harvest files and data from compromised systems in an automated way, often via scripts and command-line tools. This approach lets attackers rapidly gather documents, credentials, or other useful files meeting specific criteria like type or location. Automation can extend to cloud environments by leveraging APIs or ETL (extract, transform, load) services. Attackers may deploy this technique through remote access tools, scheduling periodic data collection.

T1185 – Browser Session Hijacking

Adversaries exploit vulnerabilities to take over a user’s active browsing session. This is often done by injecting software into the browser that inherits cookies, HTTP sessions, and SSL certificates, enabling the attacker to impersonate the user and gain unauthorized access to internal resources behind authentication, such as intranets or webmail. It also involves redirecting browser traffic through a proxy, effectively allowing an attacker to browse using the victim’s session and permissions, and potentially bypassing security mechanisms like two-factor authentication.

T1530 – Data from Cloud Storage

This technique involves adversaries accessing sensitive data stored in cloud-based storage services such as AWS S3, Azure Blob, or Google Cloud Storage. Attackers may exploit misconfigurations, stolen credentials, or compromised API keys to list, read, or download files. Once inside, they can exfiltrate proprietary, personal, or operational data. Common indicators include unusual access patterns, data downloads from unfamiliar IPs, and API activity outside normal hours. Proper IAM policies, encryption, and monitoring are key defenses against this technique.

T1602 Data from Configuration Repository 

Adversaries steal configuration and repository data (source code, CI/CD configs, IaC, manifests, registry metadata) to harvest secrets, API keys, and deployment details that enable lateral movement, privilege escalation, persistence, or wide data exfiltration. Detect unusual repo clones, API pulls, CI/CD variable reads, or changes to manifests; mitigate with secret vaults, least-privilege access, MFA/SSO, automated secret scanning, and rapid credential rotation.

T1602.001 SNMP (MIB Dump):
Attackers use SNMP to query Management Information Base (MIB) repositories, harvesting variable information such as device inventory, configuration details, software versions, and operational status. This strategy is used to build network maps and uncover system vulnerabilities.

T1602.002 Network Device Configuration Dump:
Adversaries access and export network device configuration files, which reveal critical parameters about device operation, network layout, software, and legitimate accounts. Common management tools and protocols like SNMP and Smart Install may be abused for bulk configuration retrieval, facilitating deeper network compromise or persistence.

T1213 Data from Information Repositories

Adversaries accessing or mining valuable information from shared storage tools and platforms, like SharePoint, Confluence, code repositories, and enterprise databases. These repositories commonly support collaboration and information sharing, making them prime targets for attackers seeking sensitive documents, credentials, or business intelligence. Collection activity may include direct queries, abuse of external sharing features, or leveraging APIs for mass data retrieval. Information gained from these sources can support further stages of an attack, such as privilege escalation or exfiltration.

T1213.001 Confluence: 
Adversaries may extract sensitive content from Confluence wikis, including project documentation, internal processes, or credentials.

T1213.002 SharePoint: 
Attackers can target SharePoint to harvest documents, business records, or knowledge base content that aids further compromise.

T1213.003 Code Repositories: 
Code repositories like GitHub, Bitbucket, or internal version control systems may be mined for intellectual property, secrets, or internal tooling.

T1213.004 Cloud Storage: 
Adversaries may access storage services (e.g., AWS S3, Google Drive) to obtain files or shared resources.

T1213.005 Email: 
Attackers might collect data directly from email repositories to gather communications or attachments.

T1213.006 Databases: 
Enterprise databases can be queried for large volumes of data, from customer information to proprietary business knowledge.

T1005 – Data from Local System

It involves adversaries searching for and collecting files or sensitive information directly from a compromised device’s file system or local databases. Attackers may use built-in commands, scripts, or automated tools to identify valuable documents, credentials, configuration files, or business data prior to exfiltration. This technique provides immediate access to data without needing network interaction, allowing threat actors to efficiently gather information for use in further attacks or for eventual data theft.

T1039 – Data from Network Shared Drive

Involves adversaries searching and collecting sensitive files from shared network drives (such as file servers or mapped directories) accessible within a compromised environment. Attackers may use built-in command-line tools or scripts to locate and copy documents, business records, or other valuable data from these network locations. This technique enables stealthy collection and staging of information prior to exfiltration, taking advantage of legitimate file-sharing infrastructure within organizations.

T1025 – Data from Removable Media

It describes how adversaries collect sensitive data from removable devices—such as USB drives, CDs, or external hard disks—connected to compromised systems. Attackers may search these media for files of interest and copy them for later exfiltration. This technique is often used prior to data theft episodes and can leverage interactive command shells or automated scripts. Defending against this threat requires monitoring access to removable media and restricting usage wherever possible.

T1074 – Data Staged

It’s a technique in which adversaries organize and prepare collected data in a central directory or location before exfiltrating it. This staging can involve copying files from various sources within the system into one place and may include combining or compressing files for easier transfer. Data staging helps attackers avoid detection and optimize exfiltration, as it minimizes the number of outbound connections needed.

T1074.001 Local Data Staging:
Attackers gather and store data on the compromised local system, often using file operations or shell commands to consolidate information into staging folders or temporary directories before exfiltration. Common locations include /tmp, temp folders, or newly created directories. Detection can focus on monitoring for suspicious aggregation and compression activities.

T1074.002 Remote Data Staging:
Adversaries collect and stage stolen data from multiple systems in a centralized location on a remote host within the target network. This method is often used to further evade detection and reduce direct communication with external infrastructure. Monitoring for file transfers, mounting operations, and inbound activities to staging locations on remote systems helps identify this behavior.

T1114 – Email Collection

Adversaries targeting user email for sensitive information such as trade secrets, personal details, or incident response communications. Attackers may collect emails directly from local system files, remotely access mailboxes through server APIs or webmail, or automate exfiltration by setting up persistent forwarding rules. Tools and malware exist specifically to harvest or search large volumes of email content. Successfully collecting email can give attackers valuable intelligence, credentials, and insight into organizational operations.

T1114.001 Local Email Collection:
Adversaries obtain email data stored locally on endpoints, such as Outlook PST/OST files or maildir/mbox folders.

T1114.002 Remote Email Collection:
Attackers use compromised credentials or tokens to access and gather emails directly from mail servers (e.g., Exchange, Office 365, or Google Workspace), often by querying for content or downloading entire mailboxes.

T1114.003 Email Forwarding Rule:
Malicious auto-forwarding rules are configured on accounts so emails are covertly sent to adversary-controlled destinations for continuous monitoring and collection.

T1115 – Clipboard Data

Adversaries use this to intercept user input, especially to steal credentials or sensitive information. Attackers may leverage methods such as keylogging, capturing keystrokes to record usernames, passwords, and other input. They can also use GUI input capture by mimicking system prompts to trick users, inject code into web portals to intercept login data, or use API hooking to intercept authentication details passed within system functions. Input capture may be implemented through software running on a victim’s endpoint, or in some cases, via hardware or modified system images.

T1056.001 Keylogging: 
Captures keystrokes as users type, commonly via malicious software or API hooks.

T1056.002 GUI Input Capture: 
Fakes system prompts or windows to harvest data entered by users.

T1056.003 Web Portal Capture: 
Injects code into web login forms to steal credentials.

T1056.004 Credential API Hooking: 
Hooks system APIs to intercept authentication data directly.

T1113 – Screen Capture

Adversaries take screenshots of a victim’s desktop to gather sensitive information during or after a system compromise. This functionality is often included in remote access tools and can be executed using native utilities or API calls, such as copyFromScreen (Windows),xwd (Linux), or screencapture (macOS). Attackers use screen captures to harvest credentials, confidential files, or information displayed in secure applications that might not be directly accessible otherwise. Numerous malware families and post-exploitation frameworks, including njRAT, PowerSploit, and Remcos, leverage this technique to collect data visually.

T1125 – Video Capture

It describes how adversaries may use a system’s camera or video recording tools to gather visual information from a compromised environment. Attackers could leverage built-in webcams or external cameras to record video footage, potentially capturing sensitive activities, credentials being entered, or confidential conversations and meetings. This capability is sometimes included in remote access tools or custom malware and can be automated to stream or upload data in real-time or on a schedule. Video capture can serve both intelligence gathering and extortion goals, often bypassing privacy safeguards if camera permissions are not tightly managed. 

How F5 can help? 

F5 security solutions such as WAF (Web Application Firewall), API security, and advanced bot management play a critical role in defending applications and APIs across hybrid, cloud, and on-premises environments. These protections help mitigate the threats described in MITRE ATT&CK techniques. F5 solutions, including Distributed Cloud, BIG-IP, and NGINX, safeguard against automated data scraping, unauthorized access, information harvesting, etc. For specific guidance on implementing these mitigations, refer to F5’s technical resources and product documentation tailored to MITRE ATT&CK techniques.

For more information, please contact your local F5 sales team.

Conclusion

The Collection tactic is essential in an adversary’s progression from access to exfiltration. By understanding and detecting these tactics, defenders can disrupt the data-gathering process and significantly reduce the impact of cyber intrusions. Applying layered defenses, including those from network gateways like F5 BIG-IP and endpoint tools, strengthens overall security posture.