Making Incident Response a Security Program Enabler
Sunday, April 26, 2015
12
comments
Incident response is frequently viewed as a reactive process. As soon as something bad happens that is when the incident response process is activated to respond to what occurred. This view is similar to insurance. Every month we spend money on buying insurance so it is available when we need it. It doesn’t matter if the insurance gets used once in a year or not at all; money is still spent on a monthly basis to buy it. In a way, it’s easy to see the similarity to the incident response process. Resources - such as staffing and technology - are invested in the incident response process. In some organizations there is a sizable investment while in others very little. The hope is something is available when the organizations need it. How can one change an organization’s view of incident response? How can you take a traditional reactive process and make it in to a proactive process that’s an enabler for the organization’s information security program? This post discusses one approach to make incident response a security enabler by addressing: continuous incident response, incident response metrics, root cause analysis, and data analytics.
The traditional incident response models resemble the incident response lifecycle illustrated below that was obtained from the NIST Computer Security Incident Handling Guide.
The first phase involves the organization preparing for future incidents. The next phase is when an incident is detected and analyzed. This is followed by the containment, eradication, and recovery activities. At times when trying to remediate an incident, activities cycle back to detection and analysis to determine if the incident was resolved. After the incident is eradicated and the organization returns to normal operations a post incident activity is performed to see what did and didn't work out as planned. The lifecycle represents the traditional approach to incident response: incident detected -> organization responds -> incident eradicated -> organization returns to normal operations. This is the tradition reactive incident response process where the assumption is that nothing is going on until an incident is detected and after the incident is resolved it goes back to assuming nothing is going on.
To take a traditional reactive incident response process and make it in to a proactive process requires incident response to be seen in a different light. Organizations are under constant attack from daily malware infections to daily probing to daily exploit attempts to daily potential unauthorized access attempts. The model is no longer linear where an organization is waiting to detect an incident and then returning to normal operations. The new normal is being under constant attack and being at different stages in the incident response process concurrently. Richard Bejtlich stated in his book The Practice of Network Security Monitoring on page 188 regarding the model below:
“the workflow in Figure 9-2 appears orderly and linear, but that’s typically not the case in real life. In fact, all phases of the detection and response processes may occur at the same time. Sometimes, multiple incidents are occurring; other times, the same incident occupies all four stages at once.”
Richard expressed incident response is not a linear process with a start and end; on the contrary the process can be at different phases at the same time dealing with different incidents. Anton Chuvakin also touched on the non-linear incident response process in his post Incident Response: The Death of a Straight Line. Not only did he say that the “ “normal -> incident -> back to normal” is no more” but he summed up the situation organizations find themselves in.
“While some will try to draw a clear line between monitoring (before/after the incident) and incident response (during the incident), the line is getting much blurrier than many think. Ongoing indicator scans (based on external and internal sources), malware and artifact reversing, network forensics “hunting”, etc all blur the line and become continuous incident response activities.”
The light incident response needs to be seen in is that it is a continuous process instead of a linear one. Incident response is not something that starts and ends but is an ongoing cyclical process where an organization is constantly detecting and responding to incidents. A process similar to David Bianco's the Intel-Driven Operations Cycle model shown below and was obtained from his The Pyramid of Pain Intel-Driven Detection and Response to Increase Your Adversary's Cost of Operations presentation.
Seeing incident response as a continuous process is one that everyone must see from security practitioners to incident responders to management. Changing people’s perspectives on incident response will take time and every opportunity to sell it will need to be seized (don’t sell FUD but layout the actual threat environment we find ourselves in.) In time the conversation will go from viewing incident response as insurance that may or may not be needed to viewing incident response as continuous where people are detecting and responding to the daily security incidents. The conversation will go from “do we really need to invest in this since we only had a few incidents last year” to “we are continuing seeing these incidents due to this security weakness so how can we address it since it’s an area of concern.”
Changing the view of incident response from a linear process to a continuous one is not enough to make it a security program enabler. To be a security program enabler incident response needs to contribute to the organization’s security strategy to help influence where security resources are focused. Too often incident response tries to influence the security strategy in a reactive manner. The reactive process resembles the following: incident detected -> organization responds -> incident eradicated -> organization returns to normal operations -> incident response recommendations provided. The attempts to influence the security strategy is based on the most recent incident. In essence, recommendations are being made based on a single event instead being made based on trends from numerous events. Don’t get me wrong, there are times when recommendations from a single event do influence the security strategy but to make incident response a security program enabler there needs to be more.
To re-enforce this point, a story about a local credit union that happened years ago may help. The credit union happened to be located at a busy intersection; its location was very accessible from buses, cars, bikes, and walking. One day a person walked in to the credit union, handed the teller a note, and then walked out with money. As an outsider looking at this single event, there was nothing drastic implemented from any recommendations based on this single robbery. The next week a similar event occurred again with someone handing the teller a note and walking out with cash. This occurred a few more times and each robbery was very similar. The robbery involved a person handing a note to the teller without any visible weapons shown. The credit union looked at all the robberies and they must have seen this pattern. In response, the credit union implemented a compensating control and this control was double doors to trap any individual as they try to exit the bank. After this control was implemented the robberies stopped. This story shows how incident response can become a security program enabler. The first robbery was a single event and the recommendation may had been to install trap doors. However, installing trap doors takes essential resources from other areas and this may not be in the best interest of the organization. As more data is collected from different events it causes a pattern to emerge. Now taking essential resources from other areas is an easier decision since the data analysis shows installing trap doors is not addressing a single event but a re-occurring issue.
The continuous incident response process needs to move from only providing reactive recommendations to producing intelligence by operationalizing the information produced by enterprise incident response and detection processes. To accomplish this, data and information needs to be captured from the ongoing detection and response activities. Then this data and information is analyzed to produce intelligence to be used by the security program. Some intelligence is used by the response and detection processes themselves but other intelligence (especially ones developed through trend analysis) is reported to appropriate parties to influence the organization’s security strategy. Operationalizing incident response information results in creating intelligence at various levels in the intelligence pyramid.
The book Building an Intelligence-Led Security Program authored by Allan Liska describes the pyramid levels as follows:
"Strategic intelligence is concerned with long-term trends surrounding threats, or potential threats to an organization. Strategic intelligence is forward thinking and relies heavily on estimation – anticipating future behavior based on past actions or expected capabilities."
"Tactical intelligence is an assessment of the immediate capabilities of an adversary. It focuses on the weaknesses, strengths, and the intentions of an enemy. An honest tactical assessment of an adversary allows those in the field to allocate resources in the most effective manner and engage the adversary at the appropriate time and with the right battle plan."
"Operational intelligence is real time, or near real-time intelligence, often derived from technical means, and delivered to ground troops engaged in activity against the adversary. Operational intelligence is immediate, and has a short time to live (TTL). The immediacy of operational intelligence requires that analysts have instant access to the collection systems and be able to put together FINTEL in a high-pressure environment."
As it relates to making incident response process a security program enabler, the focus needs to be on making the process contribute to the organization’s security strategy by producing tactical and strategic intelligence. Tactical intelligence can highlight the organization’s weaknesses and strengths then show where security resources can be used more effectively. Strategic intelligence can influence the direction of the organization’s long term security strategy. Incident response starts to move from being viewed as a reactive process to a proactive one once it starts adding value to other areas in an organization’s security program.
Before one can start to operationalize incident response information to produce intelligence at various levels in the intelligence pyramid they must first improve their root cause analysis capabilities. Root cause analysis is trying to determine how an attacker went after another system or network by identifying and understanding the remnants they left on the systems involved during the attack. This is a necessary activity for one to discover information during a security incident that can be operationalized. The Verizon Data Breach Investigations Report is an excellent example about the type of information one can discover by performing root cause analysis. The report highlights trends from “time to incident discovery” to “time to compromise” to exploited vulnerabilities to frequency of attack types to hacking actions. None of this data would had been available for analysis if root cause analysis wasn’t completed on these incidents.
Take the hypothetical scenario of a malware infected system. Root cause analysis discovered the attacker compromised the system using a phishing email containing a malicious Word document. At this point there is various data one can then turn in to intelligence. At the operationally level, the email’s subject line, content, from address, and Word document attachment name can all be documented and then turned in to intelligence for response and detection activities. The same can occur for the URL inside the Word document and the malware it downloads. Doing root cause analysis on all infections can then make data available to do trend analysis. Is it a pattern for the organization employees to be socially engineered through Word documents? Can resources be applied in other areas such as security awareness training to combat this threat? In time, more and more data can be collected to reveal other trends to help drive security. Performing root cause analysis on each incident is needed to operationalize incident response information to produce intelligence in this manner. The Compromised Root Cause Analysis Model is one model to use and it is described in the post Compromised Root Cause Analysis Model Revisited.
The outcome from performing root cause analysis on each incident is discoverable information. It’s not enough to consistently do root cause analysis to discover information; the information needs to be documented and analyzed to make it into intelligence. Different options are available to document security incident information but in my opinion the best available schema is the VERIS Framework. The “Vocabulary for Event Recording and Incident Sharing (VERIS) is a set of metrics designed to provide a common language for describing security incidents in a structured and repeatable manner.” The VERIS Framework is open and can be modified to meet an organization’s needs.
The schema is well designed but to support an internal incident response process some modifications may be needed. This post won’t go in to great detail about the needed modifications but I will mention a few to make the schema better support internal incident response. In the Incident Tracking section, to make it easier to track security incidents the following can be added: Incident Severity (to match the incident response process severity for incidents), Hostname (of the targeted system), IP Address (of the targeted system), Username (involved in the incident), and Source IP Address (of the attacker’s system). In the Victim Demographics section, these may or may not apply for an internal incident response process. Personally, I don’t see the need for tracking this information if the incident response process supports the same entity. In the Incident Description section, the biggest change is outlining the expected values for the vectors and vulnerabilities. For example, for the vulnerabilities list out each possible vulnerable application - such as Java vulnerability - instead of allowing for specific CVEs. This reduces the amount of work needed on doing root cause analysis without losing too much on the metrics side. The last changes I’ll discuss are for the Discovery and Response section. In this section make sure to account for the various discovery methods the organization may use to detect incidents as well as the intelligence sources behind those methods. This slight change enables an organization to measure how they are detecting security incidents and to evaluate the return on investment for different intelligence sources.
Information that is documented is only data and does not become intelligence until it is analyzed and refined so it is useful to others. There are different options available for organizations to produce intelligence from the information discovered during root cause analysis. The book Data-Driven Security: Analysis, Visualization and Dashboards goes in to detail about how one can do data analysis with free and/or open source tools. The route I initially took was to allow me to focus on the incident response process without getting bogged down trying to create visualizations to identify trends. At my company (this is the only item in this post directly tied to my employer and I only mention it in hopes it helps my readers) we went with a license for Tableau Desktop and I bought a personal copy of the book Tableau Your Data!: Fast and Easy Visual Analysis with Tableau Software. The combination of Tableau Desktop and the VERIS Framework makes it very effective at producing strategic and tactical intelligence that can be consumed by the security program. In minutes, you can create visualizations to highlight what departments in an organization is most susceptible to phishing attacks or to quickly identify the trends explaining how malware is entering the organization. The answers and intelligence one can gain from the incident response data is only limited by one’s creativity and the ability of those consuming the intelligence.
The approach an organization can take to take incident response from a reactive process to proactive one involves the following steps:
- Improving an organization's incident response capabilities
- Improving an organization's root cause analysis capabilities
- Improving an organization’s security monitoring capabilities
- Influencing others to see incident response as a continuous process
- Operationalizing incident response information
- Collecting and documenting data for the organization’s incident response metrics
- Analyzing the organization’s incident response metrics to produce intelligence
- Presenting the intelligence to appropriate stakeholders
Making incident response a security program enabler is a gradual process requiring organization buy-in and resources to make it happen. As DFIR practitioners, we can only be the voice in the wilderness telling others incident response can be more than a reactive process. It can be more than an insurance policy. It can be a continuous process enabling an organization’s security strategy and helping guide how security resources are used. A voice hoping to influence others to make the right decision to better protect their organization.
Continuous Incident Response
The traditional incident response models resemble the incident response lifecycle illustrated below that was obtained from the NIST Computer Security Incident Handling Guide.
Image obtained from http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-61r2.pdf |
The first phase involves the organization preparing for future incidents. The next phase is when an incident is detected and analyzed. This is followed by the containment, eradication, and recovery activities. At times when trying to remediate an incident, activities cycle back to detection and analysis to determine if the incident was resolved. After the incident is eradicated and the organization returns to normal operations a post incident activity is performed to see what did and didn't work out as planned. The lifecycle represents the traditional approach to incident response: incident detected -> organization responds -> incident eradicated -> organization returns to normal operations. This is the tradition reactive incident response process where the assumption is that nothing is going on until an incident is detected and after the incident is resolved it goes back to assuming nothing is going on.
To take a traditional reactive incident response process and make it in to a proactive process requires incident response to be seen in a different light. Organizations are under constant attack from daily malware infections to daily probing to daily exploit attempts to daily potential unauthorized access attempts. The model is no longer linear where an organization is waiting to detect an incident and then returning to normal operations. The new normal is being under constant attack and being at different stages in the incident response process concurrently. Richard Bejtlich stated in his book The Practice of Network Security Monitoring on page 188 regarding the model below:
“the workflow in Figure 9-2 appears orderly and linear, but that’s typically not the case in real life. In fact, all phases of the detection and response processes may occur at the same time. Sometimes, multiple incidents are occurring; other times, the same incident occupies all four stages at once.”
Image obtained from http://www.nostarch.com/nsm |
Richard expressed incident response is not a linear process with a start and end; on the contrary the process can be at different phases at the same time dealing with different incidents. Anton Chuvakin also touched on the non-linear incident response process in his post Incident Response: The Death of a Straight Line. Not only did he say that the “ “normal -> incident -> back to normal” is no more” but he summed up the situation organizations find themselves in.
“While some will try to draw a clear line between monitoring (before/after the incident) and incident response (during the incident), the line is getting much blurrier than many think. Ongoing indicator scans (based on external and internal sources), malware and artifact reversing, network forensics “hunting”, etc all blur the line and become continuous incident response activities.”
The light incident response needs to be seen in is that it is a continuous process instead of a linear one. Incident response is not something that starts and ends but is an ongoing cyclical process where an organization is constantly detecting and responding to incidents. A process similar to David Bianco's the Intel-Driven Operations Cycle model shown below and was obtained from his The Pyramid of Pain Intel-Driven Detection and Response to Increase Your Adversary's Cost of Operations presentation.
Image obtained from https://speakerdeck.com/davidjbianco/the-pyramid-of-pain-intel-driven-detection-and-response-to-increase-your-adversarys-cost-of-operations |
Seeing incident response as a continuous process is one that everyone must see from security practitioners to incident responders to management. Changing people’s perspectives on incident response will take time and every opportunity to sell it will need to be seized (don’t sell FUD but layout the actual threat environment we find ourselves in.) In time the conversation will go from viewing incident response as insurance that may or may not be needed to viewing incident response as continuous where people are detecting and responding to the daily security incidents. The conversation will go from “do we really need to invest in this since we only had a few incidents last year” to “we are continuing seeing these incidents due to this security weakness so how can we address it since it’s an area of concern.”
Operationalize Incident Response Information
Changing the view of incident response from a linear process to a continuous one is not enough to make it a security program enabler. To be a security program enabler incident response needs to contribute to the organization’s security strategy to help influence where security resources are focused. Too often incident response tries to influence the security strategy in a reactive manner. The reactive process resembles the following: incident detected -> organization responds -> incident eradicated -> organization returns to normal operations -> incident response recommendations provided. The attempts to influence the security strategy is based on the most recent incident. In essence, recommendations are being made based on a single event instead being made based on trends from numerous events. Don’t get me wrong, there are times when recommendations from a single event do influence the security strategy but to make incident response a security program enabler there needs to be more.
To re-enforce this point, a story about a local credit union that happened years ago may help. The credit union happened to be located at a busy intersection; its location was very accessible from buses, cars, bikes, and walking. One day a person walked in to the credit union, handed the teller a note, and then walked out with money. As an outsider looking at this single event, there was nothing drastic implemented from any recommendations based on this single robbery. The next week a similar event occurred again with someone handing the teller a note and walking out with cash. This occurred a few more times and each robbery was very similar. The robbery involved a person handing a note to the teller without any visible weapons shown. The credit union looked at all the robberies and they must have seen this pattern. In response, the credit union implemented a compensating control and this control was double doors to trap any individual as they try to exit the bank. After this control was implemented the robberies stopped. This story shows how incident response can become a security program enabler. The first robbery was a single event and the recommendation may had been to install trap doors. However, installing trap doors takes essential resources from other areas and this may not be in the best interest of the organization. As more data is collected from different events it causes a pattern to emerge. Now taking essential resources from other areas is an easier decision since the data analysis shows installing trap doors is not addressing a single event but a re-occurring issue.
The continuous incident response process needs to move from only providing reactive recommendations to producing intelligence by operationalizing the information produced by enterprise incident response and detection processes. To accomplish this, data and information needs to be captured from the ongoing detection and response activities. Then this data and information is analyzed to produce intelligence to be used by the security program. Some intelligence is used by the response and detection processes themselves but other intelligence (especially ones developed through trend analysis) is reported to appropriate parties to influence the organization’s security strategy. Operationalizing incident response information results in creating intelligence at various levels in the intelligence pyramid.
The book Building an Intelligence-Led Security Program authored by Allan Liska describes the pyramid levels as follows:
"Strategic intelligence is concerned with long-term trends surrounding threats, or potential threats to an organization. Strategic intelligence is forward thinking and relies heavily on estimation – anticipating future behavior based on past actions or expected capabilities."
"Tactical intelligence is an assessment of the immediate capabilities of an adversary. It focuses on the weaknesses, strengths, and the intentions of an enemy. An honest tactical assessment of an adversary allows those in the field to allocate resources in the most effective manner and engage the adversary at the appropriate time and with the right battle plan."
"Operational intelligence is real time, or near real-time intelligence, often derived from technical means, and delivered to ground troops engaged in activity against the adversary. Operational intelligence is immediate, and has a short time to live (TTL). The immediacy of operational intelligence requires that analysts have instant access to the collection systems and be able to put together FINTEL in a high-pressure environment."
As it relates to making incident response process a security program enabler, the focus needs to be on making the process contribute to the organization’s security strategy by producing tactical and strategic intelligence. Tactical intelligence can highlight the organization’s weaknesses and strengths then show where security resources can be used more effectively. Strategic intelligence can influence the direction of the organization’s long term security strategy. Incident response starts to move from being viewed as a reactive process to a proactive one once it starts adding value to other areas in an organization’s security program.
Improve Root Cause Analysis Capabilities
Before one can start to operationalize incident response information to produce intelligence at various levels in the intelligence pyramid they must first improve their root cause analysis capabilities. Root cause analysis is trying to determine how an attacker went after another system or network by identifying and understanding the remnants they left on the systems involved during the attack. This is a necessary activity for one to discover information during a security incident that can be operationalized. The Verizon Data Breach Investigations Report is an excellent example about the type of information one can discover by performing root cause analysis. The report highlights trends from “time to incident discovery” to “time to compromise” to exploited vulnerabilities to frequency of attack types to hacking actions. None of this data would had been available for analysis if root cause analysis wasn’t completed on these incidents.
Take the hypothetical scenario of a malware infected system. Root cause analysis discovered the attacker compromised the system using a phishing email containing a malicious Word document. At this point there is various data one can then turn in to intelligence. At the operationally level, the email’s subject line, content, from address, and Word document attachment name can all be documented and then turned in to intelligence for response and detection activities. The same can occur for the URL inside the Word document and the malware it downloads. Doing root cause analysis on all infections can then make data available to do trend analysis. Is it a pattern for the organization employees to be socially engineered through Word documents? Can resources be applied in other areas such as security awareness training to combat this threat? In time, more and more data can be collected to reveal other trends to help drive security. Performing root cause analysis on each incident is needed to operationalize incident response information to produce intelligence in this manner. The Compromised Root Cause Analysis Model is one model to use and it is described in the post Compromised Root Cause Analysis Model Revisited.
Incident Response Metrics
The outcome from performing root cause analysis on each incident is discoverable information. It’s not enough to consistently do root cause analysis to discover information; the information needs to be documented and analyzed to make it into intelligence. Different options are available to document security incident information but in my opinion the best available schema is the VERIS Framework. The “Vocabulary for Event Recording and Incident Sharing (VERIS) is a set of metrics designed to provide a common language for describing security incidents in a structured and repeatable manner.” The VERIS Framework is open and can be modified to meet an organization’s needs.
The schema is well designed but to support an internal incident response process some modifications may be needed. This post won’t go in to great detail about the needed modifications but I will mention a few to make the schema better support internal incident response. In the Incident Tracking section, to make it easier to track security incidents the following can be added: Incident Severity (to match the incident response process severity for incidents), Hostname (of the targeted system), IP Address (of the targeted system), Username (involved in the incident), and Source IP Address (of the attacker’s system). In the Victim Demographics section, these may or may not apply for an internal incident response process. Personally, I don’t see the need for tracking this information if the incident response process supports the same entity. In the Incident Description section, the biggest change is outlining the expected values for the vectors and vulnerabilities. For example, for the vulnerabilities list out each possible vulnerable application - such as Java vulnerability - instead of allowing for specific CVEs. This reduces the amount of work needed on doing root cause analysis without losing too much on the metrics side. The last changes I’ll discuss are for the Discovery and Response section. In this section make sure to account for the various discovery methods the organization may use to detect incidents as well as the intelligence sources behind those methods. This slight change enables an organization to measure how they are detecting security incidents and to evaluate the return on investment for different intelligence sources.
Data Analysis
Information that is documented is only data and does not become intelligence until it is analyzed and refined so it is useful to others. There are different options available for organizations to produce intelligence from the information discovered during root cause analysis. The book Data-Driven Security: Analysis, Visualization and Dashboards goes in to detail about how one can do data analysis with free and/or open source tools. The route I initially took was to allow me to focus on the incident response process without getting bogged down trying to create visualizations to identify trends. At my company (this is the only item in this post directly tied to my employer and I only mention it in hopes it helps my readers) we went with a license for Tableau Desktop and I bought a personal copy of the book Tableau Your Data!: Fast and Easy Visual Analysis with Tableau Software. The combination of Tableau Desktop and the VERIS Framework makes it very effective at producing strategic and tactical intelligence that can be consumed by the security program. In minutes, you can create visualizations to highlight what departments in an organization is most susceptible to phishing attacks or to quickly identify the trends explaining how malware is entering the organization. The answers and intelligence one can gain from the incident response data is only limited by one’s creativity and the ability of those consuming the intelligence.
Making Incident Response a Security Program Enabler
The approach an organization can take to take incident response from a reactive process to proactive one involves the following steps:
- Improving an organization's incident response capabilities
- Improving an organization's root cause analysis capabilities
- Improving an organization’s security monitoring capabilities
- Influencing others to see incident response as a continuous process
- Operationalizing incident response information
- Collecting and documenting data for the organization’s incident response metrics
- Analyzing the organization’s incident response metrics to produce intelligence
- Presenting the intelligence to appropriate stakeholders
Making incident response a security program enabler is a gradual process requiring organization buy-in and resources to make it happen. As DFIR practitioners, we can only be the voice in the wilderness telling others incident response can be more than a reactive process. It can be more than an insurance policy. It can be a continuous process enabling an organization’s security strategy and helping guide how security resources are used. A voice hoping to influence others to make the right decision to better protect their organization.
Labels:
IR