Nate French
Final Project
Internship
Prof. Van Slyke
Analyzing Digital Forensics Tools in a Virtualized Environment
1
Analyzing Digital Forensics Tools in a Virtualized Environment
The purpose of this paper is to research the viability of...
2
used will be discussed. Then it will be possible to discuss and evaluate the results gathered from
the experimentation.
...
3
the same number of files on a computer compared to performing the examination in a non-
virtualized environment. This is...
4
credit card numbers typically follow a specific pattern, a regex search would allow the
investigator to find all credit ...
5
resources available to perform the operation. With cloud computing this is generally a non-issue
because the customer ca...
6
Digital forensic tools are critical in performing cybercrime investigations. E-commerce is
estimated to grow at a rate o...
7
expected to shrink at a rate of 48% each year (Webb, 2003). As the size of storage media
continues to grow, speed will c...
8
computers, geographically separated but able to communicate through networks, that work in
tandem. Typical commercial cl...
9
number of financial records and documents. While other cases may only be concerned with what
pictures that reside on a c...
10
mediums are brought to the forensics labs, the lab must take an inventory of all evidence in their
possession. This wou...
11
Evidence must also pass the Frye test that was established in 1923 after Frye v. United States.
The Frye test requires ...
12
hard drive bays, which provide the potential for a large amount of storage locally. Four of the
hard drive bays are con...
13
the investigation. Drive one (SATA) is a 500 GB Seagate 7200rpm drive, which contains the
operating systems, virtual ma...
14
clean install of Windows 7 requires roughly 20 GB of space. Therefore the respective evidence
packages will contain rou...
15
AccessData provides numerous support documents on their website explaining how to use FTK.
Step three is to choose rele...
16
the number of individual file types found and file hashes. This data will be pulled out into a
spread sheet so that it ...
17
Figure 1 represents the total overall time each scenario took to complete processing in FTK. The
blue lines represent t...
18
The next graph, figure 2, shows the total processing time for each case. As mentioned
previously, processing is the act...
19
The next graph, Figure 3, displays the total indexing time for each scenario. Indexing is the
process of taking all of ...
20
The last graph, figure 4, shows the post processing time for each scenario. After processing has
completed, FTK will op...
21
The accuracy of the virtualized environment compared to the non-virtualized environment was
exactly the same. Not only ...
22
References
CFTT Methodology Overview. (2012, January 12). NIST Computer Forensic Tool Testing
Program. Retrieved Octobe...
23
Intel Hyper-Threading Technology. (N.D.) Retrieved 11/14/2013 from
http://www.intel.com/content/www/us/en/architecture-...
24
THE ECONOMIC IMPACT OF CYBERCRIME AND CYBER ESPIONAGE. (2013, July).
McAfee. Retrieved October 22, 2013, from www.mcafe...
25
of 26

Nate French Internship Final Paper

Published on: Mar 3, 2016
Source: www.slideshare.net


Transcripts - Nate French Internship Final Paper

  • 1. Nate French Final Project Internship Prof. Van Slyke Analyzing Digital Forensics Tools in a Virtualized Environment
  • 2. 1 Analyzing Digital Forensics Tools in a Virtualized Environment The purpose of this paper is to research the viability of using digital forensic software in a virtual environment. The first topic to be addressed is whether or not specific tools will perform competently within a virtual machine. This form of virtualization, platform virtualization, has been available since the 1990s. Virtualization has progressed immensely since it was first released; however, it is still a work in progress (Sperling, 2010). The second area of focus will be on providing a benchmark on the speed a virtualized environment will provide compared to a non-virtualized environment. As the price per unit of storage continues to fall, the speed that digital forensics frameworks can perform will continue to be a significant factor. A third area of focus will be to assess the accuracy of this software when run from a virtual machine compared to a non-virtualized environment. The acceptance of digital forensics depends solely on the accuracy of the tools and environment used. With the increasing acceptance of virtualization and cloud computing for both cost and performance gains, the advent of distributed forensic frameworks is inevitable (Roussev, 2004). Therefore, determining the accuracy of these environments is crucial for continued acceptance of digital forensics. Through testing, this research will provide statistical data to be used for determining the best approach to performing digital forensics. This testing will be by no means conclusive or definitive on this topic as there is a plethora of variables that could be tweaked to determine an optimal environment for performing virtualized digital forensics. However, this research will provide a base for continued research in this area. This research paper will first explain the key terms and ideas that will be expressed throughout this paper. Then this paper will firmly establish the importance of performing the research, after that the framework for performing the research, the hardware, and the software
  • 3. 2 used will be discussed. Then it will be possible to discuss and evaluate the results gathered from the experimentation. A. Introduce Topic and Define Key Terms 1. Virtualization The main aspect of understanding this research is the topic of virtualization, what it is and what it does. Virtualization refers to the method of creating a virtual object that acts and behaves like the real object. With platform virtualization, it is possible to create a second instance of an operating system running on the same hardware as the initial instance. For example, it is possible to run an instance of a Linux operating system on a computer that natively runs Microsoft Windows. In this instance, the Linux operating system is referred to as the guest, while the Windows instance is called the host. The guest instance is a logically segregated system that is sharing the resources of the host system. The two instances can share the processing time of the central processing unit (CPU), or if more than one CPU is present, they can be allocated to each instance. A logical partition of the disk space available on the hard drive will be segregated for use by the guest system. The guest will also receive a dedicated portion of the random access memory (RAM), which is used for quick access to important data that are used frequently. 2. Accuracy The second aspect of this research project is testing the accuracy and speed of the forensic tools. The accuracy portion of this research refers to file hashing and the number of files discovered. File hashing is the process of creating a unique identifying string for a file, similar to the idea of a finger print. Even a miniscule change in a file results in a wildly different file hash. A file hash is used in the court of law to prove that one file is the exact replica of another file, or that a forensic image of a hard drive is a replica of the original. The second portion of this area is determining that forensic tools run in a virtual environment will discover
  • 4. 3 the same number of files on a computer compared to performing the examination in a non- virtualized environment. This is important to ensure that the examiner will discover all files of interest to the case. If the virtualized environment cannot discover all files, then the validity of performing virtualized investigations is negated. This is related to the term of rate of error. Does the virtualized environment experience the same rate of error as a non-virtualized environment (hopefully 0%)? 3. Speed The speed of the different environments is not a critical factor from a legal perspective, as the law is only concerned with the accuracy of the results. However, speed is important when determining what type of environment to use. If the speed of the case creation process is severely hampered by the use of a virtualized environment, then that fact may negate any of the benefits of using the virtualized environment. When evaluating the speed of this process, we are concerned with the processing and indexing features of the Forensic Tool Kit (FTK). In order to create a case, FTK must process and index an image file of a drive. Processing is the act of determining where a file starts, stops, the file type, and the contents of the file. That file is then indexed into a case database file. This process is essentially enumerating the data in the evidence so that they are searchable by FTK. This form of indexing allows the investigator to quickly search the image for evidence with different methods. The first method that will be tested is keyword searching. In this method the investigator can search the entire drive for a specific keyword that may be evidentiary value. The second search method, which can be a very powerful method for returning specific evidence, is regex searching. In this method, the investigator can search the image for a specific pattern of data. An example of this would be searching the image for credit card numbers. To perform this search manually would be very time intensive especially if the suspect attempted to hide the data in any manner. Since
  • 5. 4 credit card numbers typically follow a specific pattern, a regex search would allow the investigator to find all credit cards of a certain pattern with one search. During the testing phase of this research, the speed that keyword searching and regex searching will be documented for each environment. The last forensic ability to be tested is the ability for FTK to perform file carving. File carving is the act of carving deleted files out of unallocated space. File carving is an important part of digital forensics as the most critical evidence is typically deleted by a suspect. Proving that a virtualized environment can carve files as competently as a non- virtualized environment is paramount for supporting a virtualized forensic framework. One of the main contributing factors to speed, as reported in (FTK whitepaper) is the input/output (I/O) channels on the computer. These channels consist of cables and busses between the various hardware components of the computer. There are optimal configurations explained in this paper that will be used in the setup of the experiment and discussed more thoroughly later in the paper. As discussed previously, virtualization and distributed frameworks are becoming more commonplace in professional environments. Both have their pros and cons compared to their opposites. One of the main pros for virtualized environments is the use of snapshots. This is the ability of creating a restore point of a virtual machine at any given time. Generally this is done once the optimal configuration of the virtual machine is complete. Over time, computers generally get bloated with unnecessary data and slow down. With the use of a snapshot, the machine can almost instantaneously be reverted back to its pristine condition. This is important for maintaining an efficient setup. The main con of a virtual environment is that the virtual machine will always have fewer resources than the machine hosting it. This means a virtual machine generally takes longer to perform tasks than the host machine would as it has fewer
  • 6. 5 resources available to perform the operation. With cloud computing this is generally a non-issue because the customer can purchase the same amount of resources as their host computers (Roussev, 2004). However, cloud computing means those resources are located in a remote setting that is not under the direct control of the user. The beneficial aspect of a distributed framework is that one job can be parsed out to multiple machines, generally improving the speed at that the operation is completed (Golden, 2004). However a distributed framework requires a competent network between devices, a protocol to share data, and a method of organizing the work distribution. The negative attribute of a distributed framework is that the user is introducing more variables in the system. With more variables comes an increased chance that something can go wrong. B. Establish the importance of testing forensic tools. Testing forensics tools is important for numerous reasons. Most importantly, testing forensic tools before using them in a production environment is a necessary and critical step in gaining legal acceptance. In order for a tool to produce legally accepted evidence pertinent to a criminal case, there has to exist research that shows conclusively what the tool is and is not capable of. The importance of digital forensics software will continue to grow as the importance of digital technology grows in society. As technology, the Internet, and E-commerce continue to grow, so will the prevalence of cybercrime. A second reason for the increase of cybercrime is the rise of “bring your own device” (BYOD) in professional settings. In general, corporate networks are more secure than home-based networks and devices. With the rise of BYOD, employees are frequently taking devices in and out of corporate networks. This increases the vectors of attack as these devices leave protected networks where the chance of infection is much greater. These infections can then migrate onto corporate networks where the rewards of cybercrime will generally be greater.
  • 7. 6 Digital forensic tools are critical in performing cybercrime investigations. E-commerce is estimated to grow at a rate of 12%-15% every year. In the United States alone, E-commerce sales for the year of 2012 reached $225 billion. By the year 2017, E-commerce is expected to hit $435 billion in yearly revenue (Trends and Data, 2012). In contrast, it is much more difficult to come up with accurate statistics on the global cost of cybercrime (Mcafee, 2013). Current research places an upper limit on cybercrime around .5%-1% of national income. Extrapolation of this data leads to a lower range of $25 billion to an upper range of $140 billion for the United States. Regardless of the actual numbers, investigators need competent tools to investigate cybercrime. This issue becomes even more important when viewed from a national security perspective. Cyber-warfare and Cyber-Espionage is the new domain for intra-country competition. For example, in 2012, the U.S. Navy was experiencing on average 110,000 cyber-attacks per hour (Worth, 2012). Cyber-attacks, when viewed from a national security perspective, can compromise intelligence, the security of troops around the globe, national secrets, and classified technology. When investigating such incidents, it is critical to know exactly what happened, what may have been taken, and what sensitive data may have been compromised. It is no longer the act of recovering fraudulent funds, but protecting the lives and security of our country. Forensic tools are the backbone of the military’s process of “cyber-attack recovery, reaction, and response functions” (Giordano, 2002). The second main reason for performing this experiment is the growth of magnetic storage devices. This research will attempt to benchmark the speed of forensic software in a virtual environment. This is important due to the fact that the growth of magnetic storage is exponential and is expected to continue growing at this rate for a long time. The price per megabyte is
  • 8. 7 expected to shrink at a rate of 48% each year (Webb, 2003). As the size of storage media continues to grow, speed will continue to become a critical factor in performing digital forensics. Consider the fact that when commercial magnetic storage was introduced in 1956, the cost per megabyte was $10,000. The 2013 cost per megabyte for magnetic storage is now $.00006 (citation needed). New methods for data storage are continually being researched with many producing spectacular results. One outstanding sample is the research performed by IBM that has found the current atomic limit to magnetic data storage. The experimental storage device is “at least 100 times denser than today’s hard disk drives” (Loth, 2010). In contrast to the growth of magnetic storage, the growth of processors, the device that processes all of this data, is expected to die within the decade. This is known as the death of Moore’s Law, which perfectly described the growth of processing speed from the 1960’s to 2020. On average, the processing speed of digital devices has doubled every 18 months. However, this process of increasing the density of transistors will reach its physical limit in 2020 when transistors reach the 7nm or 5nm mark. At this point, the explosive growth of processing speed will end, while the growth of magnetic storage continues to grow. DARPA tracks projects to replace the complementary metal-oxide-semiconductor (CMOS) technology, however only three of these replacements are potential candidates and even then, they are not very promising (Meritt, 2013). Investigators will then be faced with the knowledge that evidence drives will likely continue to grow exponentially while processing power will grow linearly. Due to this knowledge, the speed that evidence can be processed will be a critical factor in performing digital forensics. It is critical that investigators use the most optimal environments when time is a factor. The diminishing growth of processing speed will in turn create the necessity for cloud computing and distributed frameworks. Cloud computing is the concept of using multiple
  • 9. 8 computers, geographically separated but able to communicate through networks, that work in tandem. Typical commercial cloud computing options, such as EC2 from Amazon, allow the customer to use a set amount of processing power and space. The cloud computer instances are generally virtual machines located on a much larger commercial server, which is one reason this research will investigate the use of forensic tools in a virtual environment. A distributed framework is a method of parsing the work of one job between multiple computers in an organized fashion. FTK already has a Distributed Forensic Framework (DFF) that allows three remote computers to assist a fourth workstation. This DFF will likely be expanded upon in the future to allow the support of more machines. One further avenue of investigation for this line of research is combining this DFF with the use of virtual machines. In order to determine these optimal environments, further testing must be performed. Testing provides numerous benefits to enhance the state of digital forensics. Through testing, researchers can verify that a process works in a non-production environment. It is not feasible to test software in a production environment, in which the results of criminal investigations rely on the results found. Testing needs to be performed before production environments are utilized. This testing advances the requirement of having evidence being accepted in a court of law. Research is needed to prove whether or not a tool behaves in a particular fashion and whether the data produced are forensically sound. Testing will also help investigators make intelligent decisions when deciding what tools to use for a particular case. Some tools may perform better than others when investigation certain aspects of a cybercrime. In some cases, such as incident response investigations for national security purposes, time may be the important factor to consider. Investigators will need to analyze computer logs to determine what did or did not happen. Other cases may be more focused on retrieving a large
  • 10. 9 number of financial records and documents. While other cases may only be concerned with what pictures that reside on a computer. By testing different software in different scenarios, it will be possible to determine the best tools for each depending on what type of evidence the investigator is interested in. Finally, proper testing should create a reproducible experiment that can repeatedly be verified by other researchers. This factor helps to gain court acceptance by providing scientifically sound and verifiable results. II. Background Context A. History of digital forensics. Digital forensics is a sub-category of the forensic sciences that deals with the examination of evidence on digital media. Digital mediums include, but are not limited to, computers, removable storage, network traffic, and mobile phone. Digital forensics became a national concern in the 1980s as digital devices began to make their way into the corporate world. In 1984, the FBI created the CART, the Computer Analysis and Response Team, which was the first federal group organized to deal with cybercrime and digital forensics (FBI, n.d.). From this humble beginning, the digital forensics industry has grown to a billion dollar a year industry with annual growth rates of 11% (“Digital Forensic Services in the U.S. Market Research”, N.D.). Digital forensic cases can encompass many different types of crimes such as hacking or possession of contraband. However, each case will have a similar work flow. Cases begin when a crime has been detected in that evidence may be on a digital device. After detection, the evidence is seized or acquired, following chain-of-custody protocols. The chain of custody protocol requires that the transfer of evidence is documented whenever possession of the evidence changes. By following this protocol, investigators can prove that non-authorized personal did not have access to the evidence or the chance to modify it. Once the evidence
  • 11. 10 mediums are brought to the forensics labs, the lab must take an inventory of all evidence in their possession. This would include the number of devices, the make and model of each device, and which case they belong to. Once the inventory is complete, the lab can begin to make images of each device. This is accomplished by attaching the digital device to a write blocker that prevents any changes from being made to the evidence. The imaging hardware or software will then make a byte-by-byte copy of the device to produce an exact replica. This is important as investigators cannot perform the investigation on the original evidence as the process would then introduce changes to the evidence and call its validity into question. Once an image of the evidence has been created, the original evidence would then be returned to secured storage. The investigators can then use their tools of choice to perform the investigation. This would include searching the device for any relevant evidence, documenting the steps and procedures used to acquire the evidence, and organizing the evidence into a package. This is required so that, if necessary, a third party can verify the results of the investigation. The work flow and examination procedures for digital investigations were designed around pertinent case law and requirements for acceptance of evidence in a court of law. In order to establish guilt, the prosecution must provide evidence that proves criminality beyond a reasonable doubt (Commonwealth v. Webster, 1850). In order for evidence to be accepted by the court, the evidence must pass admissibility tests. To be admissible, evidence must be relevant to the case, must be a material possession (not hearsay), and must not be precluded by an exclusionary rule. The prosecution must be able to prove the authenticity of the evidence, meaning the evidence if what it is represented to be. This relates to the chain of custody mentioned earlier, as the location and possession of the evidence can be accounted for and testified for through its life cycle. This ensures that the evidence has not been tampered with.
  • 12. 11 Evidence must also pass the Frye test that was established in 1923 after Frye v. United States. The Frye test requires that technical evidence must be acquired through a scientifically proven method that has gained acceptance in that particular field of science. III. The Experiment 1. Providing an Environment Identical to Current Investigations. The purpose of this experiment and benchmarking is to provide reference information to benefit the Northeast Cyber Forensics Center (NCFC) in determining the best method for performing digital forensics. The results of these experiments will also benefit the forensics community at large by providing concrete evidence and statistics in relation to performing digital forensics in a virtualized environment. The main goal, however, will be benefiting the NCFC with its setup. Due to this, the experiment will be formatted to provide as similar of an environment as possible to what currently is being used. This will determine the use of hardware, software, and procedures during the experiment. The NCFC currently uses specialized hardware in their operations called Forensic Recovery of Evidence Device (FRED). FREDs are high-powered workstations with abnormally higher system specs specifically designed for performing digital forensic work. The FRED used in this experiment has the following specifications. The central processing unit is composed of eight Intel i7 cores. To increase performance through hyper-threading, cores are virtualized, meaning that for every one physical core present on the system, the operating system addresses two logical cores (“Intel Hyper-Threading Technology”, N.D.). Hyper-threading increases performance by allowing the operating system to schedule two threads or processes to a single core. Therefore, the FRED logically has 16 cores, 8 of that are virtualized. The specific FRED uses 16 GB of Random Access Memory (RAM), which provides a location for fast data storage and transfer compared to the much slower physical hard drives. FREDs also contain six
  • 13. 12 hard drive bays, which provide the potential for a large amount of storage locally. Four of the hard drive bays are considered hot-swaps, which are connected to the system through FireWire. These bays can have hard drives installed or removed at will even while the system is running. FireWire is capable of transferring data at a rate of 50 MB per second. The remaining two bays are connected to the machine through a Serial Advance Technology Attachment (SATA). These bays are capable of transferring data at a rate of 375 MB per second. It is important to note that the SATA bays are not hot-swap meaning the device must be installed before the system is turned on and cannot be removed during operation. The FREDs also house a native write- blocking bay that can be used to image a hard drive. The current operating system used at the NCFC is Windows 7 Professional 64 bit, which will be used in this experiment as a host (Environment one). A second windows 7 operating system will be deployed as a virtual guest on top of the windows 7 host (Environment two). A third operating system that will be used is the Ubuntu flavor of linux (Version 12.04 TLS). Linux will be used as a host operating system upon that a third Windows 7 virtual guest environment will be located (Environment 3). In order to maintain the integrity of the experiment, no operating system updates or software updates will be applied to the environments during the testing phase. In order to test the virtualization aspect of this experiment, the software VirtualBox will be used in creating the environment. VirtualBox is a free program and is capable of running on Windows or Linux. The forensic tool that will be tested is Forensic Tool Kit (FTK) 4.0.2 provided by AccessData. FTK is a commonly used forensics programs with wide acceptance within the community. This tool is also one of the main programs utilized at the NCFC for forensic investigation. In order to provide the best environment possible, this experiment will utilize the I/O channel optimization (“Quantifying Hardware Selection,” N.D.). This setup uses four drive bays to store different components of
  • 14. 13 the investigation. Drive one (SATA) is a 500 GB Seagate 7200rpm drive, which contains the operating systems, virtual machines, and the forensic tools. Drive two (SATA) is a 1000 GB Seagate 7200rmp drive, which contains the output database of results discovered during the investigation. As discussed in Quantifying Hardware Selection, the most effective method of increasing performance is housing the output in a separate drive with the highest throughput and RPM. Drive three (FireWire) is a 2000 GB Seagate 7200rmp drive, which contains the evidence images to be examined or processed. Drive four is a 1000 GB Seagate 7200rpm drive, which contains the evidence computer. Drive four is imaged with FTK imager, the output of that goes into drive three. The experiment will be comprised of three different environments and three different evidence packages. In order to provide typical evidence scenarios that will relate to current operations at the NCFC, the three evidence packages will consist of a 50, 250, and a 500 GB evidence image. To maintain the integrity of the experiment, one 1000 GB Seagate 7200 RPM hard drive will be used to store these packages. All testing will be completed on each package before testing the next package. When proceeding to the next testing package, additional evidence will be added to the current package until the size requirements are meant. This was decided for multiple reasons. This method will reduce the amount of time required to setup the evidence package. Deciding to use completely different evidence packages, we would have to ensure that any data from the previous package were securely erased from the medium. Otherwise the forensic tools could detect data from a previous package, which would contaminate the evidence environment. Additionally, processing the same evidence multiple times provides repeatability and testing certain features such as hashing and identifying evidence. The evidence medium will also have a Windows 7 operating system installed so that imaging, processing, indexing, and search times will reflect a real current investigation. The
  • 15. 14 clean install of Windows 7 requires roughly 20 GB of space. Therefore the respective evidence packages will contain roughly 30, 230, and 480 GB of evidence. The actual data comprising the evidence consists of typical data types that digital forensic investigations focus on. This includes images, documents, and videos. The images used are contained within the Dresden Image Database for Forensics Benchmarking (Gloe, 2010). The initial photos totaled 4949 high resolution images, which require roughly 15 GB of space. In order to quickly grow the amount of evidence required for the 250 and 500 GB packages, these photos were ran through image software, IrfanView, which provides batch processing. By processing these images through various RBG filters, it is possible to quickly accumulate evidence files with unique hashes. Multiple video files were acquired through Youtube. To provide variation, some files are short (5-10 minutes), while others are roughly 80-90 minutes. Numerous document files were made of varying types using Microsoft Office and notepad. These documents also contain textual evidence to complete string search and regex search experiments. A second source of documents was obtained through the Enron e-mail dataset, which has been sanitized and released as public domain material (Enron Email Dataset, 2009). This research will examine whether or not FTK will find these evidence items in both the virtual and non-virtualized environment. During the 250 GB and 500 GB runs, some data will be deleted to determine if the virtualized environments will competently recover deleted files. 2. Designing the experiment. The National Institute of Standards and Technology (NIST) created a methodology for testing forensic tools called the CFTT Forensic Tool Testing Methodology. This experiment will be modeled after the guidelines provided in this document. Step one of the CFTT is to acquire the tool to be tested. The NCFC has licensed full versions of FTK 4, which is the focus of this research. Step two is to review relevant tool documentation.
  • 16. 15 AccessData provides numerous support documents on their website explaining how to use FTK. Step three is to choose relevant test cases based on the features of the tool. FTK provides support for imaging, file carving, key word searching, and regex searching. FTK is capable of recognizing and recovering most files types. This research will test the following data types, which are frequently sought after during cases: images, documents, e-mail, and video. Step four is to create a testing strategy to evaluate FTK and its abilities. The testing workflow consists of first creating an evidence drive that will contain the evidence files to be tested. This will consist of a base Windows 7 operating system and the various document types. It is important to ensure that the evidence packages are prepared and validated before entering them into the evidence drive. This is due to the fact that it is difficult to permanently remove data from the drive once it has been entered and typically requires the full drive to be zeroed out. Once the evidence drive is ready for examination, FTK Imager will be used to create the forensic image that is readable by FTK. When drive three is being images, the evidence image is piped to drive 4 the image drive. Once imaging is complete, FTK processing can begin. When creating a new case, the case folder needs to be piped to drive two, the database drive. This is where the FTK database software has been installed and should be sufficiently large to contain the databases for all cases in the experiment. This workflow will be completed twice for all three environments. The processing will be completed twice for each environment to determine how much variation exists in each environment and to satisfy minimum requirements for thorough testing of forensic tools (Robust Correctness Testing for Digital Forensic Tools). Once processing is complete, FTK creates a result document, which lists the processing time, database optimization times, and number of files found. Now it is possible to examine the evidence through the FTK framework. At this stage, it is possible to test keyword, regex, and to gather pertinent experiment data such
  • 17. 16 the number of individual file types found and file hashes. This data will be pulled out into a spread sheet so that it can be compared to subsequent cases. Once the processing and data gathering has been completed for the 50 GB image, the subsequent 250 GB and 500 GB scenarios can be completed following this procedure. Step five of the CFTTM is to analyze results and create reports. At this point, it is possible to analyze all of the data gathered, create a formalized report, and determine which environment performed the best. IV. Results 1. Present Results: Speed and Accuracy. The results of the research surpassed our expectations for the experiment. At the onset of the experiment, we predicted that the virtualized environments would be as accurate as the non-virtualized environment, but the virtual environment would be less efficient in terms of speed. The reasoning behind this is that the virtual environment would have fewer resources allocated. With fewer resources, it would stand the reason the virtual environment would not be as fast. However, the results were completely counterintuitive and the initial research showed the virtual environment was able to process the cases faster. However, the Linux environment, which did perform faster than the non-virtualized environment, did not produce the same levels of accuracy. The Linux environment continually failed to discover the same number of files as the two Windows environments. The Linux environment was tested with the 50 GB and 250 GB evidence packages, after which, we decided to discontinue testing. As mentioned previously, the Linux environment failed to discover an acceptable number of files and was deemed not accurate enough for further testing. The overall results in speed are presented in the following graphs.
  • 18. 17 Figure 1 represents the total overall time each scenario took to complete processing in FTK. The blue lines represent the basic Windows host environment with no virtualization. The red lines represent a Windows host running a Windows guest virtual machine. In every sample the Windows virtual machine outperformed the windows host in total job time. Figure 1. Total Job Time Per Each Environment and Scenario 0:00:00 12:00:00 24:00:00 36:00:00 48:00:00 60:00:00 72:00:00 50gb 50gb 250gb 250gb 500gb 500gb Windows host Windows VM Total Job Time
  • 19. 18 The next graph, figure 2, shows the total processing time for each case. As mentioned previously, processing is the act of enumerating the digital data and deciding where a file begins, where a file ends, and what type of file it is. Similar to the total job time, the processing time was faster on the virtual Windows machine than on the basic Windows host. Figure 2. Total Processing Time Per Each Environment and Scenario 0:00:00 4:48:00 9:36:00 14:24:00 19:12:00 24:00:00 28:48:00 33:36:00 38:24:00 43:12:00 48:00:00 50gb 50gb 250gb 250gb 500gb 500gb Windows host Processing Time Windows VM Processing Time Total Processing Time
  • 20. 19 The next graph, Figure 3, displays the total indexing time for each scenario. Indexing is the process of taking all of the files that were processed in the previously mentioned text and creating a master file list of what the file is and where the file is in relation to other files. This portion of the processing allows for the quick search time present in FTK. Figure 3. Total Indexing Time Per Each Environment and Scenario 0:00:00 12:00:00 24:00:00 36:00:00 48:00:00 60:00:00 72:00:00 50gb 50gb 250gb 250gb 500gb 500gb Windows host Indexing Time Windows VM Indexing Time Total Indexing Time
  • 21. 20 The last graph, figure 4, shows the post processing time for each scenario. After processing has completed, FTK will optimize the indexing database. This is done by cleaning the database which includes compacting the information and removing duplicate entries. The database that FTK uses utilizes the structured query language or SQL. It is common for SQL databases to routinely run optimization routines to ensure the database is optimized for efficient use. Figure 4. Total Post Processing Time Per Each Environment and Scenario 0:00:00 0:14:24 0:28:48 0:43:12 0:57:36 1:12:00 1:26:24 1:40:48 50gb 50gb 250gb 250gb 500gb 500gb Windows Host Post Processing Time Windows VM Post Processing Time Total Post Processing Time
  • 22. 21 The accuracy of the virtualized environment compared to the non-virtualized environment was exactly the same. Not only did the virtualized Windows environment find the same number of files, this environment was also capable of correctly hashing all of the files of interest. Due to the large nature of the files used to check the accuracy, it is not feasible to include a visual representation of them in this document. However, the file used to check accuracy can be found online V. Conclusion Contrary to our initial assumptions, running FTK in a virtual machine was faster than running the software in a non-virtualized environment. As stated at the beginning of this document, this research is only the beginning of optimizing virtual environments for the use in digital forensics. At this time, we are unable to determine why the virtual environment was faster than the non- virtualized environment. It is a possibility that having a virtual environment on the FRED somehow slowed down the non-virtual environment and that FTK would run faster on a FRED without any virtualization software present. From this preliminary trial, it does appear that the virtual environment was the fastest and just as accurate as the non-virtual environment. This certainly lays the tracks for future research into this topic as there are many variables that could be tweaked. For further research, it would be prudent to determine what caused the discrepancy in the number of files found in the Linux environment. Considering that we only tested one Linux OS and that there are dozens of other Linux flavors, further research into the use of linux would be prudent. Another area of research would be to continue optimizing the Windows environment. This would include tweaking the memory allocation, turning off unnecessary services, and testing scenarios with a wider variety of evidence files.
  • 23. 22 References CFTT Methodology Overview. (2012, January 12). NIST Computer Forensic Tool Testing Program. Retrieved October 22, 2013, from http://www.cftt.nist.gov/Methodology_Overview.htm Curtis, G. E. (2012). The law of cybercrimes and their investigations. Boca Raton: CRC Press. Digital Forensic Services in the US Market Research. (n.d.). Market Research Reports. Retrieved October 22, 2013, from http://www.ibisworld.com/industry/digital-forensic-services.html Cohen, William. (2009, August 21). Enron Email Dataset. Retrieved October 22, 2013, from https://www.cs.cmu.edu/~enron/ FBI, Brief History of the FBI. (n.d.). FBI.gov. Retrieved October 22, 2013, from http://www.fbi.gov/about-us/history/brief-history Giordano, J. (2002). Cyber Forensics: A Military Operations Perspective.International Journal of Digital Evidence, 1(2). Retrieved October 22, 2013, from http://www.utica.edu/academic/institutes/ecii/publications/articles/A04843F3-99E5- 632B-FF420389C0633B1B.pdf Gloe, T., & Böhme, R. (2010). The ‘Dresden Image Database’ for benchmarking digital image forensics. In Proceedings of the 25th Symposium on Applied Computing (ACM SAC 2010) (Vol. 2, pp. 1585–1591).
  • 24. 23 Intel Hyper-Threading Technology. (N.D.) Retrieved 11/14/2013 from http://www.intel.com/content/www/us/en/architecture-and-technology/hyper- threading/hyper-threading-technology.html Loth, S., Baumann, S., Lutz, C., & Heinrich, A. (2012). Bistability in Atomic-Scale Antiferromagnets. Science, 335(6065). Retrieved October 22, 2013, from http://www.sciencemag.org/content/335/6065/196.abstract Merritt, R. (2013, August 27). Moore's Law Dead by 2022, Expert Says | EE Times. EE Times | Electronic Engineering Times | Connecting the Global Electronics Community. Retrieved October 22, 2013, from http://www.eetimes.com/document.asp?doc_id=1319330 Quantifying Hardware Selection in an FTK 4.0 Environment. (n.d.). DigitalIntelligence.com. Retrieved October 22, 2013, from www.digitalintelligence.com/files/FTK4_Recommendation.pdf Pan, Lei. Batten, Lynn. (). Robust Correctness Testing for Digital Forensics Tools. Roussev, V., & Richard, G. (n.d.). Breaking the Performance Wall: The Case for Distributed Digital Forensics.dfrws.org. Retrieved October 22, 2013, from dfrws.org/2004/day2/Golden-Perfromance.pdf Sperling, E. (2010, October 4). The Limits Of Virtualization. Forbes. Retrieved October 22, 2013, from http://www.forbes.com/2010/10/01/enterprise-computing-challenges- technology-cio-network-virtualization.html
  • 25. 24 THE ECONOMIC IMPACT OF CYBERCRIME AND CYBER ESPIONAGE. (2013, July). McAfee. Retrieved October 22, 2013, from www.mcafee.com/us/resources/reports/rp- economic-impact-cybercrime.pdf Trends & Data - Internet Retailer. (n.d.). Industry Strategies for Online Merchants – Internet Retailer. Retrieved October 22, 2013, from http://www.internetretailer.com/trends/sales/ Webb, Kent., (2003) A Rule Based Forecast of Hard Disk Drive Costs. Faculty Publications. Paper 9. Retrieved October 22, 2013 from http://scholarworks.sjsu.edu/cgi/viewcontent.cgi?article=1008&context=mis_pub Worth, D. (2012, December 5). HP Discover: US Navy facing 110,000 cyber attacks every hour. v3.co.uk. Retrieved October 22, 2013, from www.v3.co.uk/v3-uk/news/2229651/hp- discover-us-navy-facing-110-000-cyber-attacks-every-hour
  • 26. 25

Related Documents