July-August 2016

The changing state of DNA analysis

Kevin Petroff

First Assistant Criminal District Attorney in Galveston County

Recent changes to how labs analyze DNA mixtures caught prosecutors by surprise last fall, but there’s really nothing to worry about. Here’s the latest on these analyses, the lab reports, and how to get the information in front of a jury.

I kept going back and forth between the two documents. In one hand, I was looking at a DPS Crime Lab DNA report from 2014. It said that the DNA found in the fingernail clippings from my murder victim contained a DNA mixture that included my victim, and that my defendant could not be excluded. In fact, the report stated that the probability of selecting an unrelated person at random was 1 in 152.1 million—which sounded like pretty good odds that it was my defendant. This was great evidence and had been a basis for my indictment.
    But in my other hand was a 2015 DNA report from the same agency that said the DNA profile from the very same fingernail clippings was consistent with a mixture, but that “no interpretable DNA profile was obtained.” This was troubling.
    My panicked phone call to the DPS crime lab put me in touch with DNA Technical Leader Andrew McWhorter. As we spoke, he walked me through the problems with the method of statistical analysis that DPS and other labs around the country were using. The new lab results came from a re-analysis in response to these problems.
    But there was some good news also. First, there hadn’t been any issues in the actual testing of the DNA. The process of extracting and developing DNA profiles has not changed. And secondly, a new method of statistical analysis for DNA was being put in place that would give me more accurate, consistent results. I asked McWhorter to help me understand what had caused this shift and how the new testing would result in better evidence, and this article is intended to help other prosecutors understand what has changed and how that affects admitting DNA evidence in court.

The problem
The first indication that there was a problem came in late 2015, when DPS sent a letter notifying the criminal justice community about issues with the Combined Probability of Inclusion (CPI) method of calculating statistics for DNA mixtures. Now, I have presented DNA evidence to juries many times, and I never remember hearing the term “CPI” or having much understanding as to how these numbers were generated. What I do remember was the language used in lab reports that we emphasized before the jury: “The probability of selecting an unrelated person at random who could be the source of this DNA profile is approximately 1 in 357.3 quintillion. …”
    Upon introducing this language to the jury, I had learned to have the analyst write that number on a large white pad in front of the jury so we could all count the zeros. It was dramatic and convincing testimony. Often, the report would conclude with an even better statement: “To a reasonable degree of scientific certainty, [the defendant] is the source of the profile (excluding identical twins).”
    And that was all I knew about DNA statistics. But in speaking with scientists at DPS and after attending a few forensic science seminars, I began to have a layman’s understanding of the problems in this analytical method.
    The CPI analytical method initially worked simply by looking at the DNA data at a specific location, then using that data in its comparison with another profile. This method was used by most labs across the country. But in an effort to demonstrate consistency in forensic DNA testing, the National Institute of Standards and Technology (NIST) ran a study in 2005 involving 69 different DNA labs. The labs were asked to analyze four two-person DNA mixtures. The NIST found a wide range of variation in results between labs, and even within labs. This variation created concern for some in the scientific community.
    Five years later, the Scientific Working Group on DNA Analysis Methods (SWGDAM) released guidelines for using the CPI method. One of the recommendations was that labs use a threshold (known as the Stochastic Threshold) in its DNA analysis. Using this threshold would mean that when looking at the DNA data at a specific location, the analyst would use only data that rose above the new threshold in comparing it with another profile. Data below the threshold would be ignored.
    But the guidelines from SWGDAM were only that—guidelines. Not every lab incorporated a threshold, and those that did placed the threshold at different levels. Consistency became a serious problem. In 2013, NIST ran another study with far more complex mixtures sent to 106 labs from 45 states and Canada. The results weren’t any better, with findings that were “all over the place.” This included significant variations in statistics and some erroneous inclusions and exclusions in the more complex cases. Then, in 2015, the new Washington D.C. Crime Lab was shut down after an audit by the National Accreditation Board found that the lab’s interpretations on DNA mixture cases were not in compliance with FBI standards. At that point, the media began taking notice.
    Naturally, I had several concerns after hearing all of this. First, inconsistency between labs was a huge problem. I had been under the impression that statistical analysis in DNA was far less subjective than it apparently was. Secondly, I didn’t want to be using data that was suspect or that analysts weren’t confident in. But on the other hand, I also didn’t like the sound of simply ignoring data in a DNA mixture just because it was below a certain threshold. What if the evidence in that mixture exonerated my suspect or defendant? That was something I needed to know.

The response
Now, to be clear, this problem wasn’t the result of scientists cutting corners or trying to be misleading. As the Texas Forensic Science Commission wrote:
This finding does not mean laboratories or individual analysts did anything wrong intentionally or even knew the approaches fell outside the bounds of scientific acceptability, but rather the community has progressed over time in its ability to understand and implement this complex area of DNA interpretation appropriately.
In fact, labs in Texas began to work closely with the Texas Forensic Science Commission to address these problems. In September 2015, the Texas Department of Public Safety issued the notification that there was an issue with CPI statistics and gave each jurisdiction a list of DNA cases potentially impacted by the issue. In November 2015, TDCAA issued a letter to prosecutors on how to notify defendants and defense attorneys with cases potentially affected by this issue. Many DA offices across the state have been providing that notice since then. In addition, attorneys at the Harris County Public Defender’s Office are performing an initial screening for those defendants asking for re-testing.
    Which all leads back to that second lab report for my fingernail scrapings. The reason that my results went from “1 in 152.1 million” to “no interpretable profile” is that DPS, like many labs, re-analyzed cases using a much higher threshold. In fact, some critics argued that it was moved too high. In February, after prosecutors expressed concerns that the threshold was so high that results in a significant number of cases lost any evidentiary value at all, DPS lowered the threshold to a level that was still in accordance with SWGDAM guidelines. In my case, the second set of lab results had come before the threshold was adjusted in February, so I needed to decide whether I wanted a third set of CPI results in my case with the new threshold.

A proposed solution
The good news is that most Texas labs are moving toward a new type of statistical analysis called Probabilistic Genotyping. This is a major shift from the statistics focusing on the “probability of inclusion” to a “likelihood ratio” calculation. It helped me to understand this difference by looking at the new language that would be used in lab reports. Instead of the “probability of selecting an unrelated person at random” language of CPI, the likelihood ratio in a DNA mixture would read as:
The DNA profile is interpreted as originating from two individuals, and KMP [the victim] is an assumed contributor. Obtaining this profile is 26.1 quintillion times more likely if the DNA came from KMP [the victim] and Edgar Q [the suspect] than if it came from KMP [the victim] and one unrelated unknown individual.
    So in a nutshell, instead of comparing the odds of finding an unrelated person at random who matches a suspect, we are looking at how much more likely it is that the suspect is present than not present. While this distinction in the type of statistics might not seem like a big deal for the jury, how those numbers are obtained is important. Unlike with CPI, an analyst will no longer have to ignore data from a DNA mixture if it falls below a set threshold. Instead, all the data available will be considered and given a weight according to the levels of DNA present. Those weights are then used to calculate statistics in a likelihood ratio. The biggest advantage of using likelihood ratios is that analysts are no longer ignoring data; rather, they’re using everything they find in their analysis.
    The way that a likelihood ratio works is that opposing scenarios are compared. For instance, the comparison on a DNA mixture analysis from a rape kit might look like this:

Scenario 1: known victim + suspect

(as opposed to)

Scenario 2: known victim + unknown individual

The numbers in the lab report state how much more likely the first scenario is (suspect is included) than the second scenario (suspect is not included) in this particular mixture. In all calculations, the software concedes all doubt and uncertainty to the suspect. DPS also dropped the language regarding the “reasonable degree of scientific certainty” language at the end of the CPI reports. There are some limitations, however. Currently DPS cannot obtain a likelihood ratio in a DNA mixture with more than four people in it.
    That’s about as much of the science as I understand. Practically speaking, this analysis is run with software and computers. The software uses algorithms to compare every likelihood of these different scenarios. Currently there are at least two competing brands of software that can provide a likelihood ratio: STRmix (pronounced Star Mix) and TrueAllele. While DPS and the FBI have chosen to use STRmix, some counties have used TrueAllele on a case-by-case basis. While I am sure there are some differences between these two products, the biggest issue that has arisen is that the STRmix creator is willing to share the “source code,” or the ingredients of the program, with the State or the defense if requested in a case. At this time, TrueAllele is refusing to provide that information. I know that some in the defense bar have made an issue of this, so keep that in mind.
    Currently, all DPS labs have implemented STRmix and most have completed validation and training. Other labs in larger cities are also following this approach. The Southwestern Institute of Forensic Sciences in Dallas County is currently in the validation stage of using likelihood ratios, and the Harris County Institute of Forensic Sciences is in the contract negotiation process for this software. While the Bexar County Criminal Investigation Lab has not yet begun that process, I was told that they are moving in that direction in the next couple of years. The Texas Forensic Science Commission is also working to move all labs in this direction.
    If you are using a lab that isn’t yet using likelihood ratios, you needn’t fear. None of this means that CPI analysis is no longer valid science or evidence. But expect to prepare your analyst on whether or not the lab is following the SWGDAM guidelines regarding thresholds, and anticipate some cross-examination on the issues that labs across the country have had with CPI. The Texas Forensic Science Commission is an incredible resource in obtaining some of this information.

The courtroom
Of course, all of this change is useless if we can’t present these results in the courtroom. But there is some good news here as well. Likelihood ratios from DPS labs using STRmix have been admitted into evidence already in Smith and Bexar Counties. Additionally, prosecutors in Harris County have successfully admitted likelihood ratios from TrueAllele. Courts in several other states have also found this evidence to be admissible, including New York, Michigan, and California. The thing to remember is that nothing has changed in the actual DNA testing, so predicates that prosecutors have been using for years will change only in terms of the statistical analysis.
    I spoke to Brazoria County Assistant Criminal District Attorney Brian Hrach, who successfully admitted STRmix analysis in trial after a Daubert/Kelly hearing in April. Because he had both CPI results and STRmix results in his case, he had Houston DPS Crime Lab Analyst and DNA Technical Leader Andrew McWhorter testify about the limitations of CPI and the shift towards using likelihood ratios. McWhorter drew a diagram for the jury explaining thresholds where data falling below would be ignored and compared it to the new method of weighing all the data present. At that point, Hrach focused on the following predicate issues:
•    validation studies the lab had done on STRmix;
•    other states and countries that have used STRmix where it has been accepted in court;
•    training by the analyst and within the lab on STRmix;
•    changes in reporting the statistics;
•    peer-review journals regarding STRmix; and
•    source code availability with STRmix.
Over time, as this evidence is admitted in more Texas jurisdictions and prosecutors are trying cases where there are not multiple lab reports with different statistics on each, the process should be even more streamlined.
    The defense in Brian Hrach’s Daubert/Kelly hearing, however, focused on a couple of interesting issues in his cross of McWhorter. First, the defense attorney referenced a letter from TrueAllele’s parent company, Cybergenetics, to the FBI in response to the FBI’s notice of intent to purchase STRmix. In that letter, the author makes several claims against STRmix and sets forth why TrueAllele is a better product. While such a letter may be business as usual for competing scientific companies, make sure to have a copy of the letter and discuss it with your analyst.
    The second issue that the defense attorney raised was in regards to software and hardware requirements and issues in using STRmix. These issues were raised in both the hearing and before the jury, and it seemed to be a strategic decision to reduce this complex analysis to the simple product of a government computer in order to alienate those judges or jurors with a distrust of the government or technology. Fortunately, DNA analyst Andrew McWhorter and prosecutor Brian Hrach were able to explain that the computer was simply using complex algorithms to quickly compare multiple scenarios to generate a likelihood ratio, which seemed to put the jury and judge at ease. The likelihood ratio statistics were admitted over objection, and the defendant was found guilty of aggravated robbery and sentenced to 55 years in prison.

In conclusion
In the end, my concern over my shrinking DNA results is alleviated by the knowledge that my fingernail scraping evidence is being re-analyzed using all the potential DNA data, and that I’ll receive that report with likelihood ratio statistics in a few weeks. I’m also encouraged by the growing number of counties that are preparing to present this evidence in courts soon. Our goal as prosecutors is to always seek out the most accurate evidence possible, regardless of how it affects the case.
    It’s also important to understand that Texas has led the way for the rest of the country in addressing these issues in crime labs, and Texas prosecutors have set the standard for providing notice to defendants and requesting re-analysis in these potential problem cases thanks to the assistance of TDCAA, the Texas Department of Public Safety, the Harris County Public Defender’s Office, and the Texas Forensic Science Commission. We really have nothing to fear with these new methods of statistical analysis. i


[1] September 10, 2015, letter from Brady Mills, Deputy Assistant Director for the Texas Department of Public Safety, Crime Laboratory Service.

[2] Understanding and Addressing DNA Mixture Issues in Texas, Lynn Garcia, Center for American & International Law, February 19, 2016.

[3] Id.

[4] “Unintended Catalyst: The Effects of 1999 and 2001 FBI STR Population Data Corrections on an Evaluation of DNA Mixture Interpretation in Texas,” Texas Forensic Science Commission, August 21, 2015, http://www.fsc.texas.gov/texas-dna-mixture-interpretation-case-review.

[5] “Understanding and Addressing DNA Mixture Issues in Texas,” Lynn Garcia, Center for American & International Law, February 19, 2016

[6] Id.

[7] “National Accreditation Board suspends all DNA testing at D.C. crime lab,” The Washington Post, April 27, 20157

[8] “Unintended Catalyst: the Effects of 1999 and 2001 FBI STR Population Data Corrections on an Evaluation of DNA Mixture Interpretation in Texas,” Texas Forensic Science Commission, August 21, 2015, http://www.fsc.texas.gov/texas-dna-mixture-interpretation-case-review.

[9] September 10, 2015 letter from Brady Mills, Deputy Assistant Director for Texas Department of Public Safety, Crime Laboratory Service.

[10] DNA Mixture Notification Update, November 5, 2015, http://www.tdcaa.com/announcements/dna-mixture-notification-update.

[11] February 25, 2016, letter to Robert Kepple from Bob Wicoff, Chief, Appellate Division, Harris County Public Defender’s Office, http://www.tdcaa.com/content/texas-dna-mixture-review-project.

[12] Interview with Andrew McWhorter, DNA Technical Leader, DPS Crime Lab in Houston.

[13] Texas DPS Mixture Interpretation Update 2016, Andrew McWhorter, Technical Leader, DPS Crime Lab in Houston.

[14] Id.

[15] Id.

[16] “Access Denied: Source Code for DNA Software Remains Protected in Pa. Murder Trial,” Seth Augenstein, Forensic Magazine, February 5, 2016

[17] Letter to Jerry D. Varnell, Contract Specialist, Federal Bureau of Investigation dated April 1, 2015, from Mark Perlin, Chief Scientific and Executive Officer, Cybergenetics, https://www.cybgen.com/information/newsroom/2015/may/Letter_to_FBI.pdf.