CSIRT Request Tracker Installation Guide

Sunday, September 28, 2014 Posted by Corey Harrell 1 comments
In this post I'm releasing an installation guide to build a custom ticketing system to track and document security incidents. The guide contains nothing groundbreaking; just instructions on how to install and configure Request Tracker in CentOS with a PostgreSQL database and Apache web server. A significant portion of the work I compiled from others who are referenced in the beginning of the guide (some instructions were copied and pasted from these sources). The guide is released as is and my intention is not to be someone's technical support. I've seen numerous times people asking what ticketing systems do people use to track security incidents and the responses they receive is very limited. I'm releasing this for those interested in incident response (IR) ticketing systems so at least there will be another option to reference.

Why Request Tracker and Not Request Tracker for Incident Response


Request Tracker (RT) is an open source tracking system that organizations leverage for a range of uses. As written on the RT website, the uses include: "bug tracking, help desk ticketing, customer service, workflow processes, change management, network operations, and youth counseling." By design RT is very customizable making it an awesome choice as the platform for an IR ticketing system.

I know a common question will be why did I choose to use Request Tracker and not Request Tracker for Incident Response (RTIR). RTIR is after all a purposely built ticketing system for incident response. As I was exploring ticketing systems for incident response I spoke to two different people whose IR teams leveraged Request Tracker for their ticketing systems. I asked both of them the exact same question and they both had the same response. RTIR is not updated as frequently as RT so going with RT enables them to use newer functionality. After looking into both RT and RTIR I agreed with them. However, my decision was not solely based on frequent updates. RT allowed me to implement the workflow I wanted instead of being forced to use someone's else workflow in RTIR. My decision to use RT was for the more frequent updates and ability to implement my own workflow.

CSIRT Request Tracker Workflow


The image below shows the incident response workflow implemented in the ticketing system and the following subsections describes each area.



Incident Reported



One of my requirements for any ticketing system was the ability to automate documentation and communication amongst the Computer Security Incident Response Team (CSIRT) members. This is an area where RT excels and it does so using email. The Incident Reported area is where a security event is reported to the CSIRT. The report can come in through email and it will be automatically processed. The report can also be manual (i.e. by telephone) and either be converted into an email or typed directly into RT.

The ticketing system can be used even if email is not an option. RT has a nice web interface for managing and updating tickets.

Queues



The Queues area is where the ticket for the reported security event ends up. In the diagram there is only a General queue but RT supports having numerous queues. CSIRT members monitor the queue and any new tickets they take ownership of it. The ticket's status changes from new to triage once a member claims it.

Triage Activity



The Triage Activity area is where the reported security event is validated and escalated. The first decision made by the CSIRT member is determining if the rest of the CSIRT needs to be activated. Regardless if the CSIRT is activated or not, the reported event is triaged to determine if it meets the requirement to be declared an incident. If the reported event doesn't meet the security incident requirements then the CSIRT member who owns the ticket completes triaging it, resolves the event, and documents any IR metrics. Finally, the ticket's status is changed to closing. If reported event does meet the requirements to be declared a security incident then the ticket's status is changed to incident.

Incident Activity



The Incident Activity area is where all of the activities related to responding to, containing, and responding the security incident occur. All CSIRT members involved document their work in the ticket. The ticketing system sends out emails for every update to the ticket ensuring the whole CSIRT is aware of what is occurring and what other members are doing. Automating communication makes the response more efficient since time is not wasted holding unnecessary meetings. The ticket's status changes to closing once the incident is resolved.

Closing Activity



The Closing Activity area is for quality assurance and all tickets are forced though this area prior to being resolved. This is where the CSIRT lead reviews each ticket to verify all work has been completed and all metrics captured. Capturing metrics is critical for internal CSIRTs since it's one of the best ways to show value to management. The CSIRT lead also identifies in the ticket if there is a need to implement lessons learned or security monitoring improvements. If there isn't then the ticket is resolved. If there is then the ticket's status is changed to post incident status.

Post Incident Activity



The Post Incident Activity area is where the implementation of the lessons learned and security monitoring improvements are made. The work is appended to the same incident ticket to make easier to document and refer back to in the future. After the post incident work is completed then the ticket is finally resolved.

 

CSIRT Request Tracker Lifecycles


RT implements a workflow using something called a lifecycle. The lifecycle outlines ticket statuses and their behavior. Specifically, what a current status is allowed to be changed to. The diagram below shows the lifecycle that implements the workflow I described above. As can be seen in the diagram, the new and triage statuses have the ability to exit the workflow but once a ticket is changed to incident it is forced though the remaining workflow.


CSIRT Request Tracker Installation Guide


As I mentioned previously, this guide is released as is. I did test the installation procedure numerous times and believe I captured everything in the documentation. However, one item I didn't fully test is the email portion of the ticketing system since I didn't have a working email gateway for testing at the time.

This link points to the guides download location. The two guides are pretty much the same except one is to use fetchmail to retrieve email while the other uses sendmail to retrieve email. The latter makes the ticketing system into an email gateway. Due to this, my preference is for the fetchmail route since it's the easier path.
Labels: ,

SIEM Use Case Implementation Mind Map

Monday, September 1, 2014 Posted by Corey Harrell 1 comments
Building out an organization's security detection capability can be a daunting task. The complexity of the network, number of applications/servers/clients, the sheer number of potential threats, and the unlimited attack avenues those threats can use are only a few of the challenges. To tackle this daunting task there are different ways to build out the detection capability. One of those approaches is to do so leveraging use cases. Use cases are "a logical, actionable and reportable component of an Event Management system." The event management system I kept in mind for this mind map is a SIEM but it may apply to other types of systems. InfoSec Nirvana's post SIEM Use Cases – What you need to know? and Anton Chuvakin's post Detailed SIEM Use Case Example demonstrate how to build a use case and what it should entail. My previous post Linkz for SIEM links to a few more and this paper does as well. In this post I'm walking through how one can take a documented use case and translate that into something actionable to improve an organization's security detection capability.

SIEM Use Case Implementation Mind Map


The process to translate a use case into something actionable can be broken down into four distinct areas: Log Exploration, Custom Rules, Default Rules, and Detect Better Respond Faster. Each area has different activities to complete but there are at least minimum activities to accomplish. This process is illustrated in the mind map below:



Logs Exploration


Identify Logs


The first activity is to take a detailed look at the use case and to determine all of the log sources needed to detect the risk outlined in the use case. This may had been done when the use case was documented but it is still a good activity to repeat to ensure all logs are identified. This involves looking at the risk, the path the risk takes through the network including applications and devices, and determining which device/application contains logs of interest. For example, if the use case is to detect web server attacks then the path is from the threats to the application itself. The devices the threats pass through may include routers, switches, firewalls, IDS systems, proxy servers, web application, and web service. All of which may contain logs of interest.

Identify Required Log Events


After  the logs have been identified the next activity is to identify what events in those logs are needed. A log can contain a range of events recording numerous items but only some are specific to the use case at hand. This involves doing research on the device/application and possibly setting up a testing environment. For example, if the use case is to detect lateral movement using remote desktop then the log source would be the Windows security event logs (contains authentication events) and the events of interest are those event ids specifically for remote desktop usage.

Confirm Logging Configuration


The actual devices/applications' logging configuration are reviewed to ensure it records the events needed for the use case. Keep in mind, turning on auditing or changing configurations impacts performance so this needs to be tested prior to rolling it out production wide. If performance is significantly impacted then find an alternative method or a happy medium everyone is agreeable to.

Bring In The Logs


Now it is time to make the required configuration changes to bring the logs into the event management system. How this is done depends on the event management system and the source the logs are coming from? At times logs are pushed to event management system such as syslogs while at other times they are pulled into the event management system such Windows event logs through WMI.

Custom Rules


Explore Logs


After the log(s) are flowing into the event management system it's time to start exploring the logs. Look through the collected logs to not only see what is there and how the events are structured but to see what in the log(s) can be used in a detection rule.

Create Custom Rules


Most event management systems come with default rules and my guess is people start with those. However, I think the better option is to first create custom rule(s) for the use case. The custom rule(s) can incorporate all of the research completed, information from discussions with others, and experience and indicators from previous responses to security incidents. The custom rule(s) are more tailored to the organization and have greater success in detecting the risk outlined in the use case compared to the default rules/signatures. What custom rule to create is solely dependent on the use case? Make sure to leverage all information available and ensure the rule will hit on the item located in the event from the devices/appliances' log(s). After creation, the rule is implemented.

Monitor, Improve, and Tune Custom Rules


Monitor the implemented custom rule(s) to verify if they produce the desired results. If the rule doesn't hit on any correlated events then test the rule by simulating the activity to make it fire. The custom rule(s) need to provide the exact desired results; if it doesn't then identify how to improve the rules. After rule(s) are updated then monitor again to verify they produce the desired results. Furthermore, the rule(s) need to be tuned to the environment to reduce false positives. At times rule(s) may fire on normal behavior so adjusting rule(s) to not fire on future activity minimizes the noise.

Establish and Document Triage Process


Building out an organization's security detection capability results in activity being detected; thus a response is needed to what is detected. Based on the custom rule(s), establish a triage process to outline how the alert needs to be evaluated to determine: if it's valid and how critical it is. First, evaluate any existing triage processes to see if any apply to these new rules. If there isn't a applicable triage process then create one. The goal is to minimize the number of different triage processes while ensuring there is sufficient triage processes to handle the alerts generated by the rules.

In my opinion establishing triage processes is the second most critical step (detection rules are the first.) Triage is what determines what is accepted as "good" behavior, what needs to be addressed, and what needs to be escalated. After the custom rule(s) are implemented take some time reviewing the rule(s) that fired. Evaluate the activity that triggered the rule and try out different triage techniques. This is repeated until there is a repeatable triage process for the custom rule(s). Continue testing the repeatable triage process to make it more efficient and faster. Look at the false positives and determine if there is a way to identify them sooner in the process? Look at the techniques that require more in-depth analysis and move them to later in the process? The triage process walks a fine line between being as fast as possible and using resources as efficient as possible. Remember, the more time spent on one alarm the less time is spent on others; the less time on one alarm increases malicious activity being missed.

The final triage process is documented so it is repeatable by the entire team.

Train the Team


The final activity is to train the rest of the security detection team on the custom rule(s), how they work, and the triage process to use when they alert on activity. The team are the ones who manage the parts of the use case that have been put in place allowing the remainder activities to be completed.

Default Rules


Identify Default Rules for Use Case


At this time the default rules in the event management system are reviewed. The only default rules to be concerned about are the ones triggering on activity for the use case of interest. Identify these rules and review their properties to see how they work.

Explore Correlated Default Rules


The event management system may of had the default rules enabled but did not alert on them. Depending on the event management system the default rules may need to be enabled. However, ensure the triggered rules do not generate alerts. There is no need to distract the rest of the security detection team with alerts they will just ignore for the time being. Run queries in the event management system to identify any of the default rules who triggered on activity. Explore the triggered rules to see what the activity is and how the activity matches what the rule is looking for. There may be numerous rules which don't trigger on anything; these are addressed in the future as they occur.

Tune Default Rules


Explore the triggered rules to see what the activity is, how it matches what the rule is looking for, and how many generate false positives. Identifying false positives may require triaging a few. Default rules can be very noisy and need to be tuned to the environment. Look at the noisy rules and figure out what can be adjusted to reduce false positives.  Make the adjustments and monitor the rules to see if the false positives are reduced. If not, continue making adjustments and monitoring to eliminate the false positives. Some default rules are just too noisy and no amount of tuning will change it; these rules are disabled.

Keep in mind, when tuning rules ensure all the activity from other logs around the time of interest are taken into account. At times one data source may indicate something happened while another shows the activity was blocked.

Establish and Document Triage Process


Establishing and documenting the triage process works the same as it did in the custom rules section. Remember, the more time spent on one alarm the less time is spent on others; the less time on one alarm increases malicious activity being missed. First, evaluate any existing triage processes to see if any apply to these default rules. If there isn't a applicable triage process then create one. The goal is to minimize the number of different triage processes while ensuring there is sufficient triage processes to handle every alert. The final triage process is documented so it is repeatable by the entire team.

Train the Team


The final activity is to train the rest of the security detection team on the default rules, how they work, and the triage process to use when they alert on activity. The team are the ones who manage the parts of the use case that have been put in place allowing the remainder activities to be completed.

Detect Better Respond Faster


Measure Detection in Depth


Use cases range from having a single rule to numerous rules. Monitor and evaluate the quality of these rules and the coverage they apply to the use case. There are very little models or methods to accomplish this task. Pick a model/method to use or develop one to meet the organization's needs.

The few thought processes I've seen on measuring detection in depth are those by David Bianco. His Pyramid of Pain model is a way to determine the quality of the rules. The higher in the pyramid the better quality it is. Another item to help with determining the quality of rules is a chart provided by Anton Chuvakin in his post SIEM and Badness Detection. Finally, in time the rules that are more accurate at detecting activity will start to stand out from the rest. These are the high quality rules for the use case in question.

The second part of measuring detection in depth is tracking the rules coverage for the use case. David's bed of nails concept where he ties together the pyramid of pain with the kill chain model for detection. David tweeted links to a talk where he discusses this and I'm including them in this post. The video to the Pyramid of Pain: Intel-Driven Detection/Response to Increase Adversary's Cost is located here while the slides are located here.

Continuously Tune Rules


Over time the organization's network, servers, clients and applications change. These changes will impact the event management system and may produce false positives. Tuning the rules to the environment is an ongoing process so continue to make adjustment to rules as needed.

Continuously Improve & Add Rules Based on Response


Rules constantly evolve with existing ones getting updates and new ones implemented; all in an effort to continuously improve an organization's security detection capability. There are two sources of information to use for improvement and one of them are the things learned from triaging and responding to alerts. After each validated alert and security incident the question to ask is: what can be improved upon to make detection better. Was activity missed, can rules be more focused on the activity, is a new rule required, etc.. Each alert is an opportunity for improvement and each day strive to be better than the previous. In my opinion, the best source of intelligence to improve one's detection capabilities is the information gained through response.

Continuously Improve & Add Rules Based on Intel


The other source of information to use for improvement is intelligence produced by others. This includes a range of items from papers on the latest techniques used by threats to blog posts about what someone is seeing to information shared by others. Some of the information won't apply but the ones that do need to be implemented into the event management system. Again, the goal is to strive to be better than the previous day.

Continuously Improve Triage


Striving to be better each day is not limited to detection only. The mantra needs to be: Detect Better Respond Faster.  After each validated alert and security incident the question to ask is: what can be improved upon to make response faster. Can the triage process be more efficient, are the triage tools adequate, what can make the process faster, etc.. Each time a triage process is completed it's a learning opportunity for improvement.  Remember, the more time spent on one alarm the less time is spent on others; the less time on one alarm increases malicious activity being missed. Walk the fine line between speed and efficiency.

Ensure Logging Configuration


Over time the organization's network, servers, clients and applications configurations change. Some implemented rules in the use case are dependent upon those events being present. A simple configuration change can render a rule ineffective thus impacting an organization's security detection capability. It's imperative to periodically review the correlated events in the event management system to see if anything has drastically changed. This is especially true for any custom rules implemented.

SIEM Use Case Implementation Mind Map Wrap-up


Use cases are an effective approach to build out an organization's security detection capability. I walked through how one can take a documented use case and translate that into something actionable to improve an organization's security detection capability. The activities are not all inclusive but they are a decent set of minimum activities to accomplish.
Labels: ,