Timeline Analysis by Categories

Tuesday, October 21, 2014 Posted by Corey Harrell 5 comments
Organizing is what you do before you do something, so that when you do it, it is not all mixed up.

~ A. A. Milne

"Corey, at times our auditors find fraud and when they do sometimes they need help collecting and analyzing the data on the computers and network. Could you look into this digital forensic thing just in case if something comes up?" This simple request - about seven years ago - is what lead me into the digital forensic and incident response field. One of the biggest challenges I faced was getting a handle on the process one should use. At the time there was a wealth of information about tools and artifacts but there was not as much outlining a process one could use. In hindsight, my approach to the problem was simplistic but very effective. I first documented the examination steps and then organized every artifact and piece of information I learned about beneath the appropriate step. Organizing my process in this manner enabled me to carry it out without getting lost in the data. In this post I'm highlighting how this type of organization is applied to timeline analysis leveraging Plaso.

Examinations Leveraging Categories


Organizing the digital forensic process by documenting the examination steps and organizing artifacts beneath them ensured I didn't get "all mixed up" when working cases. To do examination step "X" examine artifacts A, B, C, and D. Approaching cases this way is a more effective approach. Not only did it prevent jumping around but it helped to minimize overlooking or forgetting about artifacts of interest. Contrast this to a popular cheat sheet at the time I came into the field (links to cryptome). The cheat sheet was a great reference but the listed items are not in an order one can use to complete an examination. What ends up happening is that you jump around the document based on the examination step being completed. This is what tends to happen without any organization. Jumping around in various references depending on what step is being completed. Not an effective method.

Contrast this to the approach I took. Organizing every artifact and piece of information I learned about beneath the appropriate step. I have provided glimpses about this approach in my posts: Obtaining Information about the Operating System and It Is All About Program Execution. The "information about the operating system" post I wrote within the first year of starting this blog. In the post I outlined a few different examination steps and then listed some of the items beneath it. I did a similar post for program execution; beneath this step I listed various program execution artifacts. I was able to write the auto_rip program due to this organization where every registry artifact was organized beneath the appropriate examination step.

Examination Steps + Artifacts = Categories


What exactly are categories? I see them as the following: Examination Steps + Artifacts = Categories. I outlined my examination steps on the jIIr methodology webpage and below are some of the steps listed for system examinations.

        * Profile the System
              - General Operating System Information
              - User Account Information
              - Software Information
              - Networking Information
              - Storage Locations Information
        * Examine the Programs Ran on the System
        * Examine the Auto-start Locations
        * Examine Host Based Logs for Activity of Interest
        * Examine Web Browsing
        * Examine User Profiles of Interest
              - User Account Configuration Information
              - User Account General Activity
              - User Account Network Activity
              - User Account File/Folder Access Activity
              - User Account Virtualization Access Activity
        * Examine Communications


In a previous post, I said "taking a closer look at the above examination steps it’s easier to see how artifacts can be organized beneath them. Take for example the step Examine the programs ran on the system. Beneath this step you can organize different artifacts such as: application compatibility cache, userassist, and muicache. The same concept applies to every step and artifact." In essence, each examination step becomes a category containing artifacts. In the same post I continued by saying "when you start looking at all the artifacts within a category you get a more accurate picture and avoid overlooking artifacts when processing a case."

This level of organization is how categories can be leveraged during examinations.

Timeline Analysis Leveraging Categories


Organizing the digital forensic process by documenting the examination steps and organizing artifacts is not limited to completing the examination steps or registry analysis. The same approach works for timeline analysis. If I'm looking to build a timeline of a system I don't want everything (aka the kitchen sink approach.) I only want certain types of artifacts that I layer together into a timeline.

To illustrate let's use a system infected with commodity malware. The last thing I want to do is to create a supertimeline using the kitchen sink approach. First, it takes way too long to generate (I'd rather start analyzing than waiting.) Second, the end result has a ton of data that is really not needed. The better route is to select the categories of artifacts I want such as: file system metadata, program execution, browser history, and windows event logs. Approaching timelines in this manner makes them faster to create and easier to analyze.

The way I created timelines in this manner was not efficient as I wanted it to be. Basically, I looked at the timeline tools I used and what artifacts they supported. Then I organized the supported artifacts beneath the examination steps to make them into categories. When I created a timeline I would use different tools to parse the categorized artifacts I wanted and then combine the output into a single timeline in the same format. It sounds like a lot but it didn't take that long to create it. Hell, it was even a lot faster than doing the kitchen sink approach.

This is where Plaso comes into the picture by making things a lot easier to leverage categories in timeline analysis.

Plasm


Plaso is a collection of timeline analysis tools; one of which is plasm. Plasm is capable of "tagging events inside the plaso storage file." Translation: plasm organizes the artifacts into categories by tagging the event in the plaso storage file. At the time of this post the tagging occurs after the timeline data has already been parsed (as opposed to specifying the categories and only parsing those during timeline generation.) The plasm user guide webpage provides a wealth of information about the tool and how to use it. I won't rehash the basics since I couldn't do justice to what is already written. Instead I'll jump right in to how plasm makes leveraging categories in timeline analysis easier.

The plasm switch " --tagfile" enables a tag file to be used to define the events to tag. Plaso provides an example tag file named tag_windows.txt. This is the feature in Plaso that makes it easier to leverage categories in timeline analysis. The artifacts supported by Plaso are organized beneath the examination steps in the tag file. The image below is a portion of the tag_jiir.txt tag file I created showing the organization:


tag_jiir.txt is still a work in progress. As can be seen in the above image, the Plaso supported artifacts are organize beneath the " Examine the Programs Ran on the System" (program execution) and " Examine the Auto-start Locations" (autostarts info) examination steps. The rest of the tag file is not shown but the same organization was completed for the remaining examination steps. After the tag_jiir.txt tag file is applied to plaso storage file then timelines can be created only containing the artifacts within select categories.


Plasm in Action


It's easier to see plasm in action in order to fully explore how it helps with using categories in timeline analysis. The test system for this demonstration is one found laying around on my portable drive; it might be new material or something I previously blogged about. For demonstration purposes I'm running log2timeline against a forensic image instead of a mounted drive. Log2timeline works against a mounted drive but the timeline will lack creation dates (and at the time of this post there is not a way to bring in $MFT data into log2timeline.)

The first step is taking a quick look at the system for any indications that the system may be compromised.  When triaging for potential malware infected system the activity in the program execution artifacts excel and the prefetch files revealed a suspicious file as shown below.


The file stands out for a few reasons. It's a program executing from a temp folder inside a user profile. The file handle to the zone.identifier file indicates the file came from the Internet. Lastly, the program last ran on 1/16/14 12:32:14 UTC.

Now we move on to creating the timeline with the command below (the -o switch specifies the partition I want to parse.) FYI, the command below creates a kitchen sink timeline with everything being parsed. To avoid the kitchen sink use  plaso filters. Creating my filter is on my to-do list after I complete the tag_jiir.txt file.

log2timeline.exe -o 2048 C:\Atad\test-data\test.dump C:\Atad\test-data\test.E01

The image below shows the "information that have been collected and stored inside the storage container." The Plaso tool pinfo was used to show this information.


Now it is time to see plasm in action by tagging the events in the storage container. The command below shows how to do this using my tag_jiir.txt file.

plasm.exe tag --tagfile=C:\Atad\test-data\tag_jiir.txt C:\Atad\test-data\test.dump

The storage container information now shows the tagged events as shown below. Notice how the events are actually the categories for my examination steps (keep in mind some categories are not present due to the image not containing the artifacts.)


Now a timeline can be exported from the storage container based on the categories I want. The program execution indicator revealed there may be malware and it came from the internet. The categories I would want for a quick look are the following:

        - Filesystem: to see what files were created and/or modified
        - Web browsing: to correlate web browsing activity with malware
        - Program execution: to see what programs executed (includes files such as prefetch as well as specific event logs

The command below creates my timeline based on the categories I want (-o switch outputs in l2tcsv format, -w switch outputs to a file, and filter uses the tag contains.) FYI, a timeslice was not used since I wanted to focus on the tagging capability.

psort.exe -o L2tcsv -w C:\Atad\test-data\test-timeline.csv C:\Atad\test-data\test.dump "tag contains 'Program Execution' or tag contains 'Filesystem Info' or tag contains 'Web Browsing Info'"

The image below shows the portion of the timeline around the time of interest, which was 1/16/14 12:32:14 UTC. The timeline shows the lab user account accessing their Yahoo web email then a file named invoice.83842.zip being downloaded from the internet. The user then proceeded to execute the archive's contents by clicking the invoice.83842.exe executable. This infection vector is clear as day since the timeline does not contain a lot of un-needed data.


Conclusion


Plaso's tagging capabilities makes things easier to leverage categories in timeline analysis. By producing a timeline containing only the categories one wants in order to view the timeline data for select artifacts. This type of organization helps minimize getting "all mixed up" when conducting timeline analysis by getting lost in the data.

Plaso is great and the tagging feature rocks but as I mentioned before I used a combination of tools to create timelines. Not every tool has the capabilities I need so combining them provides better results. In past I had excellent results leveraging the Perl log2timeline and Regripper to create timelines. At this point the tools are not compatible. Plaso doesn't have a TLN parser (can't read RegRipper's TLN plug-ins) and RegRipper only outputs to TLN. Based on Harlan's upcoming presentation my statement may not be valid for long.

In the end, leveraging categories in timeline analysis is very powerful. This train of thought is not unique to me. Others (who happen to be tool developers) are looking into this as well. Kristinn is as you can see in Plaso's support for tagging and Harlan wrote about this in his latest Windows forensic book.


Side note: the purpose of this post was to highlight Plaso's tagging capability. However, for the best results the filtering capability should be used to reduce what items get parsed in the first place. The kitchen sink approach just takes way too long; why wait when you can analyze.  
Labels: ,

CSIRT Request Tracker Installation Guide

Sunday, September 28, 2014 Posted by Corey Harrell 1 comments
In this post I'm releasing an installation guide to build a custom ticketing system to track and document security incidents. The guide contains nothing groundbreaking; just instructions on how to install and configure Request Tracker in CentOS with a PostgreSQL database and Apache web server. A significant portion of the work I compiled from others who are referenced in the beginning of the guide (some instructions were copied and pasted from these sources). The guide is released as is and my intention is not to be someone's technical support. I've seen numerous times people asking what ticketing systems do people use to track security incidents and the responses they receive is very limited. I'm releasing this for those interested in incident response (IR) ticketing systems so at least there will be another option to reference.

Why Request Tracker and Not Request Tracker for Incident Response


Request Tracker (RT) is an open source tracking system that organizations leverage for a range of uses. As written on the RT website, the uses include: "bug tracking, help desk ticketing, customer service, workflow processes, change management, network operations, and youth counseling." By design RT is very customizable making it an awesome choice as the platform for an IR ticketing system.

I know a common question will be why did I choose to use Request Tracker and not Request Tracker for Incident Response (RTIR). RTIR is after all a purposely built ticketing system for incident response. As I was exploring ticketing systems for incident response I spoke to two different people whose IR teams leveraged Request Tracker for their ticketing systems. I asked both of them the exact same question and they both had the same response. RTIR is not updated as frequently as RT so going with RT enables them to use newer functionality. After looking into both RT and RTIR I agreed with them. However, my decision was not solely based on frequent updates. RT allowed me to implement the workflow I wanted instead of being forced to use someone's else workflow in RTIR. My decision to use RT was for the more frequent updates and ability to implement my own workflow.

CSIRT Request Tracker Workflow


The image below shows the incident response workflow implemented in the ticketing system and the following subsections describes each area.



Incident Reported



One of my requirements for any ticketing system was the ability to automate documentation and communication amongst the Computer Security Incident Response Team (CSIRT) members. This is an area where RT excels and it does so using email. The Incident Reported area is where a security event is reported to the CSIRT. The report can come in through email and it will be automatically processed. The report can also be manual (i.e. by telephone) and either be converted into an email or typed directly into RT.

The ticketing system can be used even if email is not an option. RT has a nice web interface for managing and updating tickets.

Queues



The Queues area is where the ticket for the reported security event ends up. In the diagram there is only a General queue but RT supports having numerous queues. CSIRT members monitor the queue and any new tickets they take ownership of it. The ticket's status changes from new to triage once a member claims it.

Triage Activity



The Triage Activity area is where the reported security event is validated and escalated. The first decision made by the CSIRT member is determining if the rest of the CSIRT needs to be activated. Regardless if the CSIRT is activated or not, the reported event is triaged to determine if it meets the requirement to be declared an incident. If the reported event doesn't meet the security incident requirements then the CSIRT member who owns the ticket completes triaging it, resolves the event, and documents any IR metrics. Finally, the ticket's status is changed to closing. If reported event does meet the requirements to be declared a security incident then the ticket's status is changed to incident.

Incident Activity



The Incident Activity area is where all of the activities related to responding to, containing, and responding the security incident occur. All CSIRT members involved document their work in the ticket. The ticketing system sends out emails for every update to the ticket ensuring the whole CSIRT is aware of what is occurring and what other members are doing. Automating communication makes the response more efficient since time is not wasted holding unnecessary meetings. The ticket's status changes to closing once the incident is resolved.

Closing Activity



The Closing Activity area is for quality assurance and all tickets are forced though this area prior to being resolved. This is where the CSIRT lead reviews each ticket to verify all work has been completed and all metrics captured. Capturing metrics is critical for internal CSIRTs since it's one of the best ways to show value to management. The CSIRT lead also identifies in the ticket if there is a need to implement lessons learned or security monitoring improvements. If there isn't then the ticket is resolved. If there is then the ticket's status is changed to post incident status.

Post Incident Activity



The Post Incident Activity area is where the implementation of the lessons learned and security monitoring improvements are made. The work is appended to the same incident ticket to make easier to document and refer back to in the future. After the post incident work is completed then the ticket is finally resolved.

 

CSIRT Request Tracker Lifecycles


RT implements a workflow using something called a lifecycle. The lifecycle outlines ticket statuses and their behavior. Specifically, what a current status is allowed to be changed to. The diagram below shows the lifecycle that implements the workflow I described above. As can be seen in the diagram, the new and triage statuses have the ability to exit the workflow but once a ticket is changed to incident it is forced though the remaining workflow.


CSIRT Request Tracker Installation Guide


As I mentioned previously, this guide is released as is. I did test the installation procedure numerous times and believe I captured everything in the documentation. However, one item I didn't fully test is the email portion of the ticketing system since I didn't have a working email gateway for testing at the time.

This link points to the guides download location. The two guides are pretty much the same except one is to use fetchmail to retrieve email while the other uses sendmail to retrieve email. The latter makes the ticketing system into an email gateway. Due to this, my preference is for the fetchmail route since it's the easier path.
Labels: ,

SIEM Use Case Implementation Mind Map

Monday, September 1, 2014 Posted by Corey Harrell 1 comments
Building out an organization's security detection capability can be a daunting task. The complexity of the network, number of applications/servers/clients, the sheer number of potential threats, and the unlimited attack avenues those threats can use are only a few of the challenges. To tackle this daunting task there are different ways to build out the detection capability. One of those approaches is to do so leveraging use cases. Use cases are "a logical, actionable and reportable component of an Event Management system." The event management system I kept in mind for this mind map is a SIEM but it may apply to other types of systems. InfoSec Nirvana's post SIEM Use Cases – What you need to know? and Anton Chuvakin's post Detailed SIEM Use Case Example demonstrate how to build a use case and what it should entail. My previous post Linkz for SIEM links to a few more and this paper does as well. In this post I'm walking through how one can take a documented use case and translate that into something actionable to improve an organization's security detection capability.

SIEM Use Case Implementation Mind Map


The process to translate a use case into something actionable can be broken down into four distinct areas: Log Exploration, Custom Rules, Default Rules, and Detect Better Respond Faster. Each area has different activities to complete but there are at least minimum activities to accomplish. This process is illustrated in the mind map below:



Logs Exploration


Identify Logs


The first activity is to take a detailed look at the use case and to determine all of the log sources needed to detect the risk outlined in the use case. This may had been done when the use case was documented but it is still a good activity to repeat to ensure all logs are identified. This involves looking at the risk, the path the risk takes through the network including applications and devices, and determining which device/application contains logs of interest. For example, if the use case is to detect web server attacks then the path is from the threats to the application itself. The devices the threats pass through may include routers, switches, firewalls, IDS systems, proxy servers, web application, and web service. All of which may contain logs of interest.

Identify Required Log Events


After  the logs have been identified the next activity is to identify what events in those logs are needed. A log can contain a range of events recording numerous items but only some are specific to the use case at hand. This involves doing research on the device/application and possibly setting up a testing environment. For example, if the use case is to detect lateral movement using remote desktop then the log source would be the Windows security event logs (contains authentication events) and the events of interest are those event ids specifically for remote desktop usage.

Confirm Logging Configuration


The actual devices/applications' logging configuration are reviewed to ensure it records the events needed for the use case. Keep in mind, turning on auditing or changing configurations impacts performance so this needs to be tested prior to rolling it out production wide. If performance is significantly impacted then find an alternative method or a happy medium everyone is agreeable to.

Bring In The Logs


Now it is time to make the required configuration changes to bring the logs into the event management system. How this is done depends on the event management system and the source the logs are coming from? At times logs are pushed to event management system such as syslogs while at other times they are pulled into the event management system such Windows event logs through WMI.

Custom Rules


Explore Logs


After the log(s) are flowing into the event management system it's time to start exploring the logs. Look through the collected logs to not only see what is there and how the events are structured but to see what in the log(s) can be used in a detection rule.

Create Custom Rules


Most event management systems come with default rules and my guess is people start with those. However, I think the better option is to first create custom rule(s) for the use case. The custom rule(s) can incorporate all of the research completed, information from discussions with others, and experience and indicators from previous responses to security incidents. The custom rule(s) are more tailored to the organization and have greater success in detecting the risk outlined in the use case compared to the default rules/signatures. What custom rule to create is solely dependent on the use case? Make sure to leverage all information available and ensure the rule will hit on the item located in the event from the devices/appliances' log(s). After creation, the rule is implemented.

Monitor, Improve, and Tune Custom Rules


Monitor the implemented custom rule(s) to verify if they produce the desired results. If the rule doesn't hit on any correlated events then test the rule by simulating the activity to make it fire. The custom rule(s) need to provide the exact desired results; if it doesn't then identify how to improve the rules. After rule(s) are updated then monitor again to verify they produce the desired results. Furthermore, the rule(s) need to be tuned to the environment to reduce false positives. At times rule(s) may fire on normal behavior so adjusting rule(s) to not fire on future activity minimizes the noise.

Establish and Document Triage Process


Building out an organization's security detection capability results in activity being detected; thus a response is needed to what is detected. Based on the custom rule(s), establish a triage process to outline how the alert needs to be evaluated to determine: if it's valid and how critical it is. First, evaluate any existing triage processes to see if any apply to these new rules. If there isn't a applicable triage process then create one. The goal is to minimize the number of different triage processes while ensuring there is sufficient triage processes to handle the alerts generated by the rules.

In my opinion establishing triage processes is the second most critical step (detection rules are the first.) Triage is what determines what is accepted as "good" behavior, what needs to be addressed, and what needs to be escalated. After the custom rule(s) are implemented take some time reviewing the rule(s) that fired. Evaluate the activity that triggered the rule and try out different triage techniques. This is repeated until there is a repeatable triage process for the custom rule(s). Continue testing the repeatable triage process to make it more efficient and faster. Look at the false positives and determine if there is a way to identify them sooner in the process? Look at the techniques that require more in-depth analysis and move them to later in the process? The triage process walks a fine line between being as fast as possible and using resources as efficient as possible. Remember, the more time spent on one alarm the less time is spent on others; the less time on one alarm increases malicious activity being missed.

The final triage process is documented so it is repeatable by the entire team.

Train the Team


The final activity is to train the rest of the security detection team on the custom rule(s), how they work, and the triage process to use when they alert on activity. The team are the ones who manage the parts of the use case that have been put in place allowing the remainder activities to be completed.

Default Rules


Identify Default Rules for Use Case


At this time the default rules in the event management system are reviewed. The only default rules to be concerned about are the ones triggering on activity for the use case of interest. Identify these rules and review their properties to see how they work.

Explore Correlated Default Rules


The event management system may of had the default rules enabled but did not alert on them. Depending on the event management system the default rules may need to be enabled. However, ensure the triggered rules do not generate alerts. There is no need to distract the rest of the security detection team with alerts they will just ignore for the time being. Run queries in the event management system to identify any of the default rules who triggered on activity. Explore the triggered rules to see what the activity is and how the activity matches what the rule is looking for. There may be numerous rules which don't trigger on anything; these are addressed in the future as they occur.

Tune Default Rules


Explore the triggered rules to see what the activity is, how it matches what the rule is looking for, and how many generate false positives. Identifying false positives may require triaging a few. Default rules can be very noisy and need to be tuned to the environment. Look at the noisy rules and figure out what can be adjusted to reduce false positives.  Make the adjustments and monitor the rules to see if the false positives are reduced. If not, continue making adjustments and monitoring to eliminate the false positives. Some default rules are just too noisy and no amount of tuning will change it; these rules are disabled.

Keep in mind, when tuning rules ensure all the activity from other logs around the time of interest are taken into account. At times one data source may indicate something happened while another shows the activity was blocked.

Establish and Document Triage Process


Establishing and documenting the triage process works the same as it did in the custom rules section. Remember, the more time spent on one alarm the less time is spent on others; the less time on one alarm increases malicious activity being missed. First, evaluate any existing triage processes to see if any apply to these default rules. If there isn't a applicable triage process then create one. The goal is to minimize the number of different triage processes while ensuring there is sufficient triage processes to handle every alert. The final triage process is documented so it is repeatable by the entire team.

Train the Team


The final activity is to train the rest of the security detection team on the default rules, how they work, and the triage process to use when they alert on activity. The team are the ones who manage the parts of the use case that have been put in place allowing the remainder activities to be completed.

Detect Better Respond Faster


Measure Detection in Depth


Use cases range from having a single rule to numerous rules. Monitor and evaluate the quality of these rules and the coverage they apply to the use case. There are very little models or methods to accomplish this task. Pick a model/method to use or develop one to meet the organization's needs.

The few thought processes I've seen on measuring detection in depth are those by David Bianco. His Pyramid of Pain model is a way to determine the quality of the rules. The higher in the pyramid the better quality it is. Another item to help with determining the quality of rules is a chart provided by Anton Chuvakin in his post SIEM and Badness Detection. Finally, in time the rules that are more accurate at detecting activity will start to stand out from the rest. These are the high quality rules for the use case in question.

The second part of measuring detection in depth is tracking the rules coverage for the use case. David's bed of nails concept where he ties together the pyramid of pain with the kill chain model for detection. David tweeted links to a talk where he discusses this and I'm including them in this post. The video to the Pyramid of Pain: Intel-Driven Detection/Response to Increase Adversary's Cost is located here while the slides are located here.

Continuously Tune Rules


Over time the organization's network, servers, clients and applications change. These changes will impact the event management system and may produce false positives. Tuning the rules to the environment is an ongoing process so continue to make adjustment to rules as needed.

Continuously Improve & Add Rules Based on Response


Rules constantly evolve with existing ones getting updates and new ones implemented; all in an effort to continuously improve an organization's security detection capability. There are two sources of information to use for improvement and one of them are the things learned from triaging and responding to alerts. After each validated alert and security incident the question to ask is: what can be improved upon to make detection better. Was activity missed, can rules be more focused on the activity, is a new rule required, etc.. Each alert is an opportunity for improvement and each day strive to be better than the previous. In my opinion, the best source of intelligence to improve one's detection capabilities is the information gained through response.

Continuously Improve & Add Rules Based on Intel


The other source of information to use for improvement is intelligence produced by others. This includes a range of items from papers on the latest techniques used by threats to blog posts about what someone is seeing to information shared by others. Some of the information won't apply but the ones that do need to be implemented into the event management system. Again, the goal is to strive to be better than the previous day.

Continuously Improve Triage


Striving to be better each day is not limited to detection only. The mantra needs to be: Detect Better Respond Faster.  After each validated alert and security incident the question to ask is: what can be improved upon to make response faster. Can the triage process be more efficient, are the triage tools adequate, what can make the process faster, etc.. Each time a triage process is completed it's a learning opportunity for improvement.  Remember, the more time spent on one alarm the less time is spent on others; the less time on one alarm increases malicious activity being missed. Walk the fine line between speed and efficiency.

Ensure Logging Configuration


Over time the organization's network, servers, clients and applications configurations change. Some implemented rules in the use case are dependent upon those events being present. A simple configuration change can render a rule ineffective thus impacting an organization's security detection capability. It's imperative to periodically review the correlated events in the event management system to see if anything has drastically changed. This is especially true for any custom rules implemented.

SIEM Use Case Implementation Mind Map Wrap-up


Use cases are an effective approach to build out an organization's security detection capability. I walked through how one can take a documented use case and translate that into something actionable to improve an organization's security detection capability. The activities are not all inclusive but they are a decent set of minimum activities to accomplish.
Labels: ,

auto_rip, tr3secure_collection & DFS updates

Tuesday, August 19, 2014 Posted by Corey Harrell 0 comments
This post is a quick update about a few things I've been working on over the years.

auto_rip updates


auto_rip is a wrapper script for Harlan Carvey's RegRipper and the script has a few updates. For those unfamiliar with the program please refer to my post Unleashing auto_rip. The script's home has always been on the RegRipper Google Code site but Google dropped support for adding new downloads. As a result, I thought it might be helpful to make newer versions available at different places since Google Code can no longer be used. One of the download locations where the script will be available is Google Drive. The link to the download folder is located here. In addition, along the right side of this blog is a link to the script's download location.

Harlan has put in a ton of work on Regripper and in the spring he released some updates to the program. Inside the downloadable archive he made available a file named updates.txt that outlines all of the work he did. New plug-ins, combined plug-ins, retiring plug-ins, etc.. Needless to say, an outstanding tool is now even better. After Harlan released the updates others asked if I was going to update auto_rip to match. Things have been busy so it took longer than I wanted. However, I finally updated auto_rip to account for the new RegRipper updates.

The latest auto_rip download works with the RegRipper rr_20140414.zip download. All changes are noted at the top of the code. The changes include: adding plug-ins to parse, removing plug-ins no longer supported, adding the malware category (not all malware plug-ins run), and parsing the AMcache.hve registry hive with a new switch (Harlan, awesome job making this plug-in). I also renamed the executed to reflect it is 64bit and won't work on 32bit systems. Harlan, again thanks for all the work you put into maintaining the project.

tr3secure_collection_script


Another script I released is the tr3secure_collection_script. This script automates the collection of volatile and non-volatile data from systems to support incident response activities. For more information about the script refer to my posts: Dual Purpose Volatile Data Collection Script and Tr3Secure Data Collection Script Reloaded. This script was another Google Code causality and had to find a new home (Google Drive again.) The link to the download folder is located here. In addition, along the right side of this blog is a link to the script's download location.

Besides getting a new home there is only one small change in this version. I dropped support for pv.exe since it is no longer downloadable. At some point in the future there may be another update on this front.

Digital Forensic Search Update


I have been keeping track of the various blogs and websites to add to the Digital Forensic Search over the past six months. I finally added the new sites to the index. To access the custom Google you can use this link directly. To see the full list of what is included in the index refer to the post: Introducing the Digital Forensics Search.

Where's the IR in DFIR Training?

Sunday, August 10, 2014 Posted by Corey Harrell 10 comments
I'm writing this post to voice a concern about trainings for incident response. I am painting this picture with a broad stroke. The picture does not apply to every $vendor nor does it apply to every training course. I'm not trying to lump everyone into the same boat. I'm painting with a broad stroke to not single out any specific entity or course but to highlight areas for improvements as well as opportunities for future training offerings. I started seeing this issue a while ago when I was looking at various incident response trainings for people brand new to our field. However, a recent event prompted to me to paint this picture. A picture showing: traditional digital forensic training does not equal incident response training.

Sketching The Picture


As I start to sketch I hope the picture becomes more clear. Our field is one referred to as the Digital Forensic and Incident Response field. Digital forensics and incident response are closely related; some say one is a subset of the other. Both use common tools,  common techniques, and are interested in the same artifacts on systems. Despite the similarities, the two have drastically different objectives and these objectives is what impacts how different the training needs to be for both incident response and digital forensics.

Defining Digital Forensics


Digital forensics has numerous definitions depending on who is the one defining it. The NIST Guide to Integrating Forensic Techniques into Incident Response states the following:

"it is considered the application of science to the identification, collection, examination, and analysis of data while preserving the integrity of the information and maintaining a strict chain of custody for the data"

The guide further explains digital forensics by laying out the process it follows as shown below. The collection includes: identifying, recording, and acquiring data from possible sources while ensuring data preservation. The examination includes: forensically processing the collected data to identify data of interest while ensuring data preservation. The analysis includes: "analyzing the results of the examination, using legally justifiable methods and techniques, to derive useful information that addresses the questions that were impetus for performing the collection and examination." Lastly, the reporting includes: reporting the results of the analysis.


The types of cases where I've personally seen the above digital forensic process used varies from: acceptable use policy violations (internal investigations), financial fraud investigations, civil proceedings (one entity suing another), and divorce proceedings. The types of cases where I heard this process is used but never participated in is criminal proceedings such as murder, robberies, white collar, and child pornography.

Defining Incident Response


Digital forensic techniques are leveraged in the incident response process but the processes are not the same. The phrase incident response is typically used to describe two separate activities organizations perform. The two activities (incident handling and incident response) are defined in the document Defining Incident Management Processes for CSIRTs: A Work in Process. Incident Handling is the activity "that involves all the processes or tasks associated with “handling” events and incidents." Incident Response are the "actions taken to resolve or mitigate an incident, coordinate and disseminate information, and implement follow-up strategies to prevent the incident from happening again."

There are different incident response workflows -such as those listed in Enisa's Good Practice Guide for Incident Management. The NIST Computer Security Incident Handling Guide also outlines an incident response process workflow as shown below. The preparation includes: establishing an incident response capability and preventing incidents. The detection and analysis includes: accurately detecting possible security events, triaging those events to confirm if they are an incident, analyzing the incident, determining its scope, determining how it occurred, and what originated the incident. The containment, eradication, and recovery includes: developing remediation strategy, carrying out the containment strategy, eliminating the components of the incident, and restoring systems to normal operations.


The types of cases where the incident response process is used varies from: malicious network behavior, malicious code, website compromise, unauthorized access, denial of service, or account compromise.

Painting the Sketch with Color


The digital forensic process differs greatly from the incident response process. Digital forensics is the "application of science to the identification, collection, examination, and analysis of data" typically in support of investigations. Incident response on the other hand is to effectively detect, investigate, contain, and remediate security incidents. The training needs of each process is significantly different even though forensic techniques are used in both. To illustrate this point it's necessary to explore some of the concepts in DFIR trainings and show how they are not sufficient for incident response.

The topics listed below are the ones I noted from some entry level DFIR training courses:

     - Recover deleted partitions
     - Introduction to NTFS
     - Deleted data recovery
     - Web browser history
     - Print spooler recovery
     - Collection techniques
     - Windows registry analysis to include: USB device analysis, file/folder access, and program execution
     - Email forensics
     - Windows artifacts to include: jump lists, VSCs, and link files
     - Windows event log analysis

Those topics are all outstanding for a DFIR training. As it relates to digital forensics, these definitely need to be covered due to examiners frequently encountering them on all cases. As it relates to incident response, these techniques and artifacts may be relevant to the security event at hand but there are even more relevant incident response topics that are not covered. In my opinion, these trainings are more meant for those doing digital forensics instead of those doing incident response. This is because the curriculum in these trainings are not sufficient for training people on how to do incident response.  I'll elaborate with two examples.

The incident response work flow consists of: detecting security event, triaging the security event, analyzing the security incident, containing the incident, eradicating the incident and recovering from the incident. Now let's say someone attended a training that covered all of the digital forensic topics listed above. As soon as they return to their organization they are faced with a potential web server compromise. That analyst will not had learned the skills to do the following:

- Detecting the web server attacks in the first place. Entry level DFIR trainings barely mention detection, how to improve detection, and how to leverage different detection techniques.

- Triaging the potential security event. Entry level DFIR trainings are mostly focused on the digital forensic case examples I listed previously. The little incident response cases exposed in the trainings are slated towards malware or advanced threats with very little mention about compromised webservers. 

- Analyzing the web server compromise. Entry level DFIR barely cover web server compromises and almost all are focused on the Windows OS. A good percentage of web servers are Linux based so Windows focused trainings don't add much value in this case.

- Scoping the incident. Practically no DFIR trainings discusses how to identify and extract indicators and how they can be used to scope an incident in an environment.

- Containing the incident. This is not addressed at all in most trainings.

- Eradicating the incident. Again a topic not even addressed.

In essence, that analyst would be incapable of handling the web server compromise even if they attended the DFIR training. Let's explore another type of common security event; multiple systems hit with a piece of malware. That same analyst would be incapable of dealing with this event since they wouldn't learn the skills to do the following:

- Detecting all the systems compromised with malware. Most DFIR trainings are single system focus and don't cover methods to detect all of the systems involved with a security event

- Triaging the event. DFIR trainings lack how one should do forensics remotely over the wire (with both free and paid options) to triage an event. Plus, the trainings don't go into detail about how to extract indicators and use them to detect other compromised systems.

- Analyzing the system(s) impacted with malware. Entry level DFIR trainings don't go into detail about how to perform malware root cause analysis or how to explore the malware to determine its capabilities.

- Scoping, containing, and eradicating the incident. Again, topics not covered

The shortcomings of the available DFIR trainings is not limited to web server compromises or malware incidents. The same could be said for the other types of security events: account compromise, malicious network behavior, and unauthorized access. The reason - in my opinion - is because those DFIR trainings are more geared towards traditional digital forensics than they are for incident response. Case in point, that same analyst could be successful in the following digital forensic cases: acceptable use policy violations (internal investigations), financial fraud investigations, civil proceedings (one entity suing another), and divorce proceedings. This shows that the current DFIR trainings are actually digital forensic trainings with very little incident response.

Framing the picture


The picture has been painted. Digital forensics and incident response are two different processes with different objectives and different trainings needs. The current entry level DFIR trainings are more satisfying the digital forensic needs without even addressing the incident response needs. At this point there is still one outstanding option. Most $vendors have multiple training courses available so that analyst needs to take multiple DFIR courses. Before I pick apart this argument I suggest to the reader to take a hard look at the DFIR trainings available and not even the entry level ones. Again, this does not apply to every $vendor nor does it apply to every training course but how many truly address the needs of incident response. How many really instruct on how to: detect, triage, analyze, and contain security events.

Economics of Incident Response


The reflection about the available DFIR trainings should had shed some light on the lack of choices for those looking for incident response focused trainings. For the sake of an argument, let's say the needs of incident response was addressed but one would have to take numerous courses. To me, this is not a feasible option for most places due to the costs involved.

It's been a well reported fact that in most organizations information security is a very small percentage of the organization's overall budget. Incident response typically falls within information security in organizations so the place where we are starting  is already underfunded with very little money available. Now, it has also been widely reported that within information security most resources are dedicated to prevention with small percentages applied towards detection and response. The small slice of the pie is now even smaller.

That analyst already has the odds stacked against them with most organizations applying very little resources towards incident response. Most of the DFIR trainings range from $3,000 to $5,000 dollars per course. On top of that an organization has to pay travel, lodging, and per diem. Let's say the trainings are on the lower end of $3,000 per course. The travel includes plane ticket and transportation to and from the hotel; let's say this is $1,000. The hotels vary based on location but most DFIR trainings last for five days; let's say the room costs $200 per night for a total of another $1,000. The per diem rate varies on location; let's use the federal per diem rate for upstate New York even though trainings never come up this way. The per diem rate is $110 per day for a total of  $660 (six days with the extra day for travel). The true cost for this analyst to attend a single training is $5,660.

Remember, the slice of the budget pie is already small to begin with. The analyst could justify to the organization to get sent to training for $5,660 to improve their capability to perform incident response. However, for that same analyst to say "for me to have all the skills I need to do incident response then I'll need to attend these three or four trainings at a cost of about $17,000 to $22,000." That's a very very hard sell in most organizations especially if their first investment of $5,660 does not even enable their staff to handle the commonly faced security events. Now this organization may not want to send just one person but multiple to build out their incident response capability. The costs go from about $17,000/$22,000 for one person to $34,000/$44,000 for two people to $68,000/$88,000 for three people. As can be seen, the costs add up quickly and this cost is only for the training. It doesn't include the other required resources such as equipment. The multiple training courses option is not a feasible option for most organizations so they are left with the current training offerings, which don't address incident response.

Don't get me wrong, there are some companies who can afford to follow this training model by sending multiple people to multiple trainings per year. I think these companies are the exception though since most organizations  only have a small piece of a small slice of the budget pie allocated for detection and response.

Wanting a New Incident Response Picture


The picture I painted is not a master piece but it does show that the current traditional digital forensic training does not equal incident response training. This is not a picture I want on my wall nor is it a picture I want to show to those who are brand new to the incident response field. I would rather have a new picture; a picture where an investment of $5,660 provides instant results to an organization wanting to improve their incident response capability. By instantly showing results will actually encourage organizations to invest even more resources into their detection and response capability. A picture where a single training addresses numerous commonly faced security events such as: malware incidents, web server compromises, and an advanced threat compromise. A training that covers how to perform the incident response process (at least detection, triage, analysis, and containment) for each one of those security events. A training that does not regurgitate the high level incident response process stuff - which can be read online - but jumps right in into the practical content showing how to do this work within an enterprise. This is the picture I would prefer; this is the picture I want to show to those new to our field.

Where's the IR in DFIR Training?


I wrote this post to voice a concern I see with the various DFIR trainings for people brand new to our field. A good portion of the current trainings are geared towards digital forensics and they are not incident response trainings. This is an issue organizations are faced with and one I even see within my lab. The way I worked around this issue is also not a suitable option for most organizations who lack having access to a person with the DFIR skillset. We developed an extensive in-house training  to ensure our incident response needs are meet. However, at some point we do incorporate third party training but there are few options I see that will add instant value. The trainings don't address the incident response process.  For other organizations, the digital forensics focused training is the only option left on the table. To send people new to the field to a DFIR training and have them return to their organization capable of doing digital forensics but not the incident response. The capability the organization was trying to improve in the first place.

Labels:

Review of Penetration Testing A Hands-On Introduction to Hacking

Wednesday, July 30, 2014 Posted by Corey Harrell 1 comments
Helping train a computer security incident response team (CSIRT) comes with the territory when building out an enterprise incident response process. As I was reading No Starch's recently released Penetration Testing A Hands-On Introduction to Hacking book by Georgia Weldman I saw an opportunity. The book target's audience are beginners. In the author's words "I thought an introduction to pentesting would make the biggest impact on the audience I most wanted to reach." The opportunity I saw was the book can also be leveraged by people new to incident response. Specifically, new to responding to systems hit with either client-side or web application attacks. To see how a beginner pen testing book can become a beginner incident response book read on.

Practical Approach


No Starch Press has a series of books with the word "practical" in their title. These books are the equivalent of an entire training for the cost of a book. The reason is because they take a practical approach. They provide supplemental material which allows the reader to do the same hands-on activities they are reading about throughout the book. Not only does the reader learn through reading but they learn by doing using the same tools and data described in the books. The Penetration Testing book may not have "practical" in its title but it follows the same approach, which is why it is a great book for beginners.

The book starts out by having the reader configure four different virtual machines (VMs). One VM is a Kali box, which is the platform the reader launches attacks from. The three other VMs (Windows XP, Windows 7, and Ubuntu) are configured very vulnerable to make it easier to attack them. As the reader progresses through the book they read about performing various attacks followed by actually doing the attacks against the vulnerable VMs. It's an excellent approach to expose beginners to pen testing. On the flip side, it's also an excellent approach to expose beginners to incident response. Not only does the reader get familiar with different attacks, vulnerabilities, and the payloads but they are creating compromised vulnerable VMs to examine after the fact. So the reader can compromise a system then examine the system to see what attack artifacts are present.

The book does cover numerous vulnerabilities and exploits. However, were two attack categories I thought excel at helping incident response beginners.  Over the years, attacks targeting client-side and web applications have been a significant issue for organizations. These attacks are so common that sooner or later they will be encountered by most incident responders. As a result, exploring these attacks is of utmost important for people new to the IR field and the book addresses these attacks thought-out different chapters. The reader may gain a thing or two from the other attacks (i.e. MS08-067 exploitation) but the exploits used in those attacks are something they may not see outside of training environments due to their age.

One Configuration Change for XP VM


One step to perform in addition to what is outlined in the Setting Up Your Virtual Lab chapter is to enable logging for Filezilla (FTP service) on the XP VM. To do so, access FileZill'a admin panel then select Edit -> Settings to get to the configuration window. In the logging section check the Enable Logging selection.



Where to Look


Before diving into the client-side or web application attacks I need to take a moment to outline where the reader needs to look for attack artifacts. I won't go into detail about all of the artifacts and will only cover the pertinent ones to explore the attacks for the Windows VMs.

Windows XP


One of the vulnerable applications installed on Windows XP is XAMPP. XAMPP is an "Apache distribution containing MySQL, PHP and Perl; this provides the VM with vulnerable web services. The following are the logs and artifacts of interest the reader would want to review after the attacks:

Apache logs location:  C:\xampp\apache\logs
FileZilla log location: C:\xampp\FileZillaFTP\Logs


The Windows XP operating system artifacts of interest are:

NTFS Master File Table ($MFT)
NTFS Logfile ($Logfile)
Internet explorer browser artifacts


Windows 7


One of the applications installed on the Windows 7 VM is the web service IIS. The book provides a vulnerable web application (called BookApp) that runs on top of IIS. The ISS logging settings can be left in their default configuration. The following are the logs and artifacts of interest the reader would want to review after the attacks:

IIS logs location: C:\inetpub\logs\LogFiles\W3SVC1
Windows HTTP.sys error logging location: C:\Windows\System32\LogFiles\HTTPERR


The Windows 7 operating system artifacts of interest are:

NTFS Master File Table ($MFT)
NTFS Logfile ($Logfile)
NTFS Change Journal ($USNJrnl)


Stay Organized


The last tidbit I'll cover before diving into the attacks is the need to stay organized. As I worked my way through the book I took notes to record when and what attack I performed against what vulnerable VM. This reference made things easier as I was examining the artifacts; it was easier to match up attacks against activity in the artifacts. The screenshot below shows a portion of the notes I kept


Keeping notes is not a necessity but something I highly recommend for beginners.

Client side attacks


Artifacts left by client-side attacks have been well documented on this blog. To name a few: Java was documented here and here, Silverlight discussed here and here while Adobe Reader was mentioned here. Even though some artifacts have already been documented replicating client-side attacks is more beneficial than only reading research done by others. Actually performing the attack against a vulnerable VM shows a completely different perspective and one that can't be attained by just reading about artifacts. The image below shows the Windows XP parsed MFT around the time the system was compromised with the Aurora Internet Explorer exploit (the exploit is the htm page.)


Web Application attacks


The book covers numerous attacks against web applications on the Windows XP, Windows 7, and Ubuntu VMs. The variety of the attacks I thought provided a good starting point for exploring the different ways web servers and applications are compromised. In this post I'm only discussing a few items to demonstrate the traces left in certain artifacts showing these attacks took place.

Vulnerability and Nmap Script Scan Against Windows 7


Chapter 6 walks through a process for finding vulnerabilities on a VM. The chapter has the reader use the following methods: Nessus (vulnerability scanner), Nmap scripting engine (port scanner), Metasploit (exploitation framework), Nikto (web application scanner), and one manual technique. Exploring this aspect for incident response is important since for a system to be compromised a vulnerability had to be exploited. Seeing the traces left by the different methods provides an idea about what one may come across during examinations.

The image below shows the Nessus vulnerability scan activity in the Windows 7 IIS logs. The image is only a small portion of the logs but it illustrates how noisy vulnerability scanners are. This is obvious activity that most monitoring tools will detect fairly quickly. It's also activity that will most likely be tied to people authorized to perform this work for an organization and not an attacker.


In addition to the IIS logs in the W3SVC1 folder, the HTTP.sys error logging provides additional clues about the vulnerability scanning activity as shown below.


The Nmap script scan activity leaves traces in the logs as well but nothing close to the amount left by a vulnerability scanner. The image below shows the Windows 7 IIS logs where the Nmap script scan activity was located (note: the activity will be dependent on the script ran and won't always appear as it does in the image.)


Command Execution Vulnerability Against Windows 7 IIS


Part III in the book has eight different chapters focused on different types of attacks. I won't rehash what all of the attacks are since the book's table of contents is adequate enough. One of the attacks was against a command execution vulnerability, which allows a person to execute commands against the system through the web application. In order to see the command's output - since the web application doesn't display - it requires the attacker (aka reader) to redirect it to a file on the server and then the file is accessed.

The image below shows the command being executed though the "subscribe to a newsletter" functionality (line in the middle.) It's not too apparent what is taking place since the HTTP method used is a POST to send data to the server. The data is not reflected in the URL. However, shortly after the POST there is a GET request for a file on the server named test.txt and the 200 status code means OK (file is there). The POST method may not be too revealing but the GET method reveals a file is on the server.


The server's MFT (parsed into a timeline) provides more clarity about the attack as shown below. The ipconfig.exe program executed at the same time as a new user (georgia) subscribed to a newsletter. Looking at the contents of test.txt shows it contains the ipconfig command output.


PHPAdmin SQL Command Against Windows XP


Another attack that was discussed is leveraging the PHPAdmin panel to execute SQL commands directly against a backend database. The attack involved executing a SQL command that creates a simple web shell on the system; shell only lets you execute commands in a supplied URL.

The Apache logs reveal the attack as seen in the image below. The line for the GET request at 21:17:01 (EDT) shows the URL containing the exact SQL command that creates the file called shell.php. The other highlighted lines show a request for the shell.php file followed by the various commands that were executed through the shell.


The server's MFT parsed into a timeline provides more clarity about the attack as shown below (time is in UTC).


Conclusion


As described in the book's introduction it is for beginners. Those who are new to pen testing field. I also see value in the book for incident response. For those who are either new to incident response without a background in offensive security or those who haven't been exposed to compromised web servers and applications. One shouldn't expect to know everything about investigating systems impacted by client-side and web attacks after working their way through the book. Heck, the reader may never see these exact attacks (outside of pen testers) against production systems. However, the reader will have a better understanding about attack vectors. The means by which an attacker can gain access to a system in order to deliver a payload or malicious outcome. The reader will have an even better understanding about the attack vector artifacts left on those systems. For anyone fitting what I just described should definitely pick up this book, build the VMs, attack the VMs, and then discover what traces they can find. It won't be the last step in their journey but it will be a good first one.

Linkz for SIEM

Sunday, July 13, 2014 Posted by Corey Harrell 2 comments
Security information and event management (SIEM) has been an area where I have spent considerable time researching. My research started out as curiosity to see if the technology could solve some problems then continued to get organization buy-in followed by going all in to architect, implement, and manage a SIEM for my organization. Needless to say, I did my homework to ensure our organization would not follow in the footsteps of others who either botched their SIEM deployments and/or ended up with a SIEM solution that doesn't meet expectations. In this linkz post I'm sharing my bookmarks for all things SIEM.

We Need a Response Team

In the movie Avengers after everything else failed Nick Fury was there saying "we need a response team." Regardless what the World Security Council said Nick kept saying "we need a response team." The Avengers is a great movie with many parallels to incident response (yes, I cut up the movie to use for in-house training and it's on the deck for my next presentation). Deploying a SIEM - as with any detection technology - will result in things being detected. After things are detected then someone will need to respond to it to investigate it. As a result, my initial focus for my SIEM research was on designing and implementing an enterprise-scale incident response (IR) process. For a bunch of IR linkz see my post Linkz for Incident Response. My initial focus on IR wasn't solely because things will be detected. I also see IR activities merging with security monitoring activities. To see more about this thought process refer to one of the links I shared in my post by Anton Chuvakin titled Fusion of Incident Response and Security Monitoring?

General SIEM Information

The first few linkz provide general information about SIEM technology. Securosis did a series about SIEMs and three of their posts provide an overview about what they are. The posts are: Understanding and Selecting SIEM/Log Management: Introduction and UnderstandingSelecting SIEM/LM: Data Collection, and Understanding and Selecting SIEM/LM: Aggregation, Normalization, and Enrichment.

Hands down the best content I found about SIEM was written by Anton Chuvakin who is a Gartner analyst. This post links to a lot of his material and the following posts are self explanatory: SIEM Analytics Histories and Lessons and SIEM Resourcing or How Much the Friggin’ Thing Would REALLY Cost Me?

Rounding out my general links is another one by Securosis. In their series this post actually came a lot later (after the articles I listed in the Planning the SIEM Project section) but I think the content is more important to have up front. To get anywhere with a SIEM in an organization someone has to agree to it. Someone who has the ability to purchase it. This is where Securosis's next article comes into play since it provides examples of the justifications one could use. For more see the post Understanding and Selecting SIEM/LM: Business Justification.

Planning the SIEM Project

The next round of links is what I found to be gold designing my organization's SIEM solution. One thing I didn't want to happen was to follow in the footsteps of so many companies before me. They buy and acquire a SIEM then end up with a solution that doesn't solve any of their problems. Bumbling a SIEM project is not something I wanted in my rear view mirror. To avoid this from happening I spent considerable time researching how to be successful in SIEM deployments so I could avoid the pitfalls that others have fallen in. Sitting where I am today and reflecting back I'm really glad I did my homework upfront as our SIEM project continues along addressing our use cases.

The best reference I found to help architect a SIEM solution is a slide deck by Anton Chuvakin. The presentation is Five Best and Five Worst Practices for SIEM and it outlines the major areas to include in your SIEM project (16 to be exact). It may not cover everything -such as building rules, alarms, and establishing triage processes - but it does an outstanding job outlining how to avoid the pitfalls others have fallen in. If anyone is considering a SIEM deployment or in the midst of a SIEM deployment then this is the one link they will want to read.

Continuing on with links from Anton that provide additional details are the following: On Broken SIEM Deployments, Detailed SIEM Use Case Example, On Large-scale SIEM Architecture, On SIEM Deployment Evolution, and Popular SIEM Starter Use Cases. All of these posts are worth taking the time to read.

Similar to the amount of information Anton makes public Securosis also has a wealth of great SIEM posts. The following posts are great since they discuss use cases: Understanding and Selecting SIEM/LM: Use Cases, Part 1 and Understanding and Selecting SIEM/LM: Use Cases, Part 2.

Selecting a SIEM

At some point a SIEM may be bought and it is helpful to know what should be taken into consideration. Again, Anton and Securosis have posts addressing this as well. Anton has the two posts Top 10 Criteria for a SIEM? and On Choosing SIEM while Securosis has their white paper Understanding and Selecting SIEM/Log Management.

The last reference to use for SIEM selection is the analysis done by Gartner. Regardless what people may think about how Gartner comes to their conclusions the company does publish quality research. The SIEM Magic Quadrant analyzes the various SIEM products, ranks them, and discusses their pros/cons. To get the quadrant you need to download it from a SIEM vendor and yes that vendor will start contacting you. To find where to download it just google "SIEM Magic Quadrant 2014" for this year's report.

Managing a SIEM

Up to this point there are a lot of references one can use to help in: exploring SIEM technology, selecting a SIEM, and designing a SIEM solution. However, I started to see a drop in information for things that occur after a SIEM is up and running. There appears to be very little related to SIEM operations and how others are leveraging SIEM to solve security issues. I only have two linkz both by Anton which are On SIEM Processes/Practices and On SIEM Tool and Operation Metrics.

Due to the lack of SIEM operational literature I branched out to look at resources related to security monitoring. To a certain extent this was helpful but again not exactly what I was looking for. What I've been looking to find is literature focused on detection, intelligence, and response. What I came across was more of the general information as opposed to operational information. One slide deck I found helpful for identifying operational areas to consider was Brandie Anderson's 2013 SANs DFIR Summit slide deck Building, Maturing & Rocking a Security Operations Center.

Keep Waiting for a Decent SIEM Book

The book Security Information and Event Management (SIEM) Implementation is the only SIEM book on the market. I saw the poor Amazon reviews but I opted to take a chance on the book. I was designing and implementing a SIEM so I wanted to read anything I could on the subject. I gave this book two chances but in the end it was a waste of my money. The book doesn't address how to design and implement a SIEM solution nor does it properly cover SIEM operational processes. For those looking for a SIEM book I'd keep waiting and in the meantime read the linkz in this post. I wanted to give others considering this book a heads up.

Labels: ,