Introduction of Incident Response

In this post, I am going to explain Incident Response. This is my personal opinion and does not reflect the views of F5.

First I am going to discuss what the term "incident" means.

Incident

Here I discuss Computer Security incidents, or Information Security incidents. A security incident refers to, either an unauthorized or unexpected computer event that compromises, or has the potential to compromise, the computer system and that also compromises the Confidentiality, Integrity, and Availability (CIA) of the victim's computer systems, network, or data. Security incidents could include, but are not limited to, unauthorized access to the system and to the data, theft or loss of sensitive information, attacks by malware (including Computer viruses and bots), system crashes or malfunctions of the system, insider threats, Denial of service (DOS) attacks, and social engineering.

For more formal definition of the incident are:

RFC 2350 definition of the incident: "any adverse event which compromises some aspect of computer or network security."

ISO 27001 which is the leading standard for information security management, defines that "An information security incident is caused by an event that has the potential to affect the confidentiality, integrity, or availability of information."

In a more general way of saying, either intentional or accidental events relates to computer Security and the events which are suspected to be compromised.

Examples of incidents are, got scanning either by human attackers or bots, system intrusion, Denial Of Service (DoS), phishing, and theft of personal/sensitive information. ISO 27001 gives example that " Theft or loss of equipment, such as a company laptop containing classified or sensitive information stolen from a bag or forgotten at an airport lounge, is an example of an information security incident."

There is no perfection in what humans do. So it is not possible to prevent all the incidents. There could be many factors to cause the incidents, including unknown vulnerabilities in the system, not-upgraded software, or fixes which is not provided in time. So there is always a chance of an incident happening.

And once the incident happened the counter-action should be coordinated to prevent the spread of the incident’s effect, in other words, the impact.

Impact

The impact of the incident may differ for each organization - to do effective Incident Response, it needs to define what needs to be protected, and what is not approved.

Incidents may have serious consequences for the organizations, including reputational damage, legal liabilities, compliance violations, and financial losses. Therefore, it is important for organizations to have a coordinated incident response plan and incident response team (CISRT) to mitigate the impact of the incidents and minimize the damage associated with them.

The worst scenario of the incident is: the network is breached, the system is intruded on, there is stolen sensitive information, and the system is used as a Jump host (or Jump server) for further attack.

To avoid this situation, it is important to detect the incidents quickly and to communicate and coordinate with the appropriate team. And the team you should work on is the Computer Security Incident Response Team (CSIRT).

CSIRT

CSIRT or Computer Emergency Response Team (CERT: registered trademark of CERT/CC) is the team that does incident response, with the dedicated knowledge and experience, and the communication points of incident responding. Also needs the right tools/scripts to detect and respond to incidents quickly and efficiently.

For reference, There is an RFC (Request For Comments): RFC 2350 Expectations for Computer Security Incident Response

Incident Response

When an Incident happens on your computer and/or computer network, and you detect it, the Incident Response starts. Incident Response is a coordinated procedure that shall be taken when the incident happens.

The incident response process includes identifying, assessing, responding, and mitigating the impact which is caused by an incident(s). It involves coordinated and systematic approach to managing and minimizing the impact of an incident on an organization’s systems, network, and data.

The aim of incident response is to contain the damage of the incident as quickly as possible, preserve data, logs, and other forms of evidence which is related to the incident, investigate the cause and scope of the incident, and in the end, restore normal operations as quickly as possible and to minimize the impact of the incident.

For doing incident response, you need to prepare for all of that.

At first, it is essential to formulate the incident response policy along with the organization's security policy. It needs to define what to be protected, what is to be prioritized, and how much damage is allowed.

Along with the incident response policy, the incident response process is determined and documented. CSIRT will execute the process. The members of CSIRT need to fully understand the policy and process, adequate knowledge and skills of computer security, ability to respond to the unexpected situations, and morale.

Incident Response Procedure

Like incident response policies, incident response procedures vary from organization to organization. Therefore, the procedure presented here is only an example and is not one of any particular company.

A typical incident response process starts with Preparation. As discussed above, this is to formulate an incident response policy and procedures, define the roles and responsibilities of each organization, and identify potential threats and vulnerabilities.

When you noticed or detected an Incident, the next process Identification starts. Detecting and recognizing an incident, whether through automated systems or human observations. When you detect any indication of incidents or indication of Compromise (IoC), it needs to confirm if it is an incident. Sometimes it can be legitimate access. But if you find anything suspicious but are not sure, it is an incident. If the incident happens in the BIG-IP system, please refer to K11438344

And also, it is better to confirm the incident response policy and procedure, as well as operational manuals (if you have one) before starting actions.

And then take logs/records of your findings and actions. It should include a brief description of the incident, date/time, who is involved and which computer is involved, and the action you take.

And then contact the responsible person(s) and person in charge of the system. If you are not a CSIRT member, you need to contact them.

The next thing to do is Containment. However, it depends on the organization's security policy whether the next action is containment or preservation of the logs/data. Containment is to take immediate actions to contain the incident and prevent to spread (further damage), such as isolating the affected systems and/or disconnecting the system from the network (if the network is compromised, disconnect from other network). The large part of the purpose of this is to halt the information/data leakage. In a same time, you need to preserve the logs for analysis, forensics, and for submit to legal authority. Since the attacker will erase the logs and data of the system once breached, so preserving the logs and data needs to be done as quickly as possible. One of the purpose of disconnecting the network is to preserve the evidence.

Without the incident, the logs can be erased by log rotation, system malfunction, and/or intentionally erasing or modification by the legitimate users, so it is important to stop the system (doing shutdown or not is depends on the security policy - the memory data will be erased by shut down) to preserve the logs.

Once the containment and preserving the logs/data were done, next thing to do is Analysis: Investigating the logs and data and determine the cause and scope of the incident. The analysis results is to assess the impact of the incident in the organization. To identify the scope of the impact of the incident, consider the CIA of the service or system which are compromised.

And then move to the next, Eradication. It is removing the cause of the incident, typically malware or vulnerabilities.

Based on the analysis of the scope and impact of the incident, one determines when the re-activate the system, when to let the system back online, mitigation, and public communication.

For BIG-IP's mitigation methods, please refer to: K30534815.

Review the mitigation methodology to check if the vulnerable factor(s) really patched.

The next procedure is Communication: Once the mitigation is done, contact the related departments and provide the necessary information (only necessary information, no sensitive information is included). After that, needs to determine what information needs to be public disclosure and what is not - management, the public relation department, and the executive might be involved to determine that.

Needs to be careful how much info can be disclosed to the public. Also, consider notifying the national CSIRT coordination center (if there is).

The last procedure is Recovery. Restoring affected systems and that to normal operations and ensuring that any necessary mitigations are made to prevent future incidents. To mitigate the impact of the security incident, faster restoration is better, however, make sure that the same attack methodology won't work after that.

Typically the IT department makes it back online, and the system is going to be recovered from backup media. Keep monitoring during the recovery.

One more after that: Post-incident response review. Document the incident and incident response process, conduct lessons learned, implement any necessary improvements to the incident response procedure, and report back to upper-level management. If some of the information can be disclosed, consider reporting to the other CSIRT coordination center.