A Fish in a Pond or a Needle in a Haystack? DNA Tool Raises Promise, Privacy Concerns

For the first time, researchers connected two different types of DNA snippets to identify individuals. This could help researchers across many fields — but isn’t without risk.

7:00 AM

Author | Kara Gavin

TV crime dramas often feature law enforcement characters saying, "We got a hit in CODIS" — a potential match of DNA found at a crime scene to a central database of DNA samples gathered from people suspected of or convicted of past crimes.

MORE FROM THE LAB: Subscribe to our weekly newsletter

A few tiny snippets of genetic material might be enough to send a perpetrator to jail by the episode's end — or free a wrongfully convicted person.

Meanwhile, medical dramas feature DNA-based diagnosis and care, made possible by rapid DNA testing and research in collections of samples from thousands of people.

And nature shows might feature wildlife specialists using DNA from fur or droppings to query a database of animal DNA and guide their search for a member of an endangered species.

All three use different databases to achieve the same goal: finding a genetic needle in a haystack or a single fish in a pond.

These databases use different types of genetic fragments to identify people or animals, which keeps those databases from "talking" to one another. The haystacks and ponds stand apart.

Now, a team of researchers has published findings that could help break down that barrier. In the Proceedings of the National Academy of Sciences, the team reports a way to identify the same individuals across two genetic databases.

Using the type of fragments law enforcement employs, called STRs, or short tandem repeats, and the kind wielded more in the medical research realm, called SNPs, or single nucleotide polymorphisms, researchers matched the correct person more than 90 percent of the time in a group of 872 individuals.

The team used a technique that relies on linkage disequilibrium, which allows many bits of otherwise unrelated DNA variants to be passed down together so that knowing one can help you guess the others. This way, a record in one database can be used to search for its match in another database.

Their achievement, if proven to work in larger groups, could help researchers combine massive genetic databases — and avoid counting the same person twice in their analyses. That could help medical research move faster and further on many diseases.

The approach might also help law enforcement zero in on suspects who aren't in the national CODIS database but whose DNA is on file somewhere else that they can access with a warrant.

(DNA) record matching across different databases is a medical care issue as well as a privacy issue.
Jun Z. Li, Ph.D.

Privacy concerns

Because of this potential, researchers note that the technique could open up privacy risks by linking records in the criminal justice system with those in medical or ancestry research, unless protections are put in place.

SEE ALSO: Decoding Cancer: A Personalized Approach Targets Genetics

For instance, it could inadvertently give law enforcement information about the health-related genetic traits someone carries.

Or it could allow the DNA a person voluntarily contributed to a research project, or sent to a company that offers ancestry or health-related genetic testing, to be used for purposes they didn't permit.

Jun Z. Li, Ph.D., the University of Michigan genetics and bioinformatics researcher who participated in the effort along with Stanford University and University of Manitoba colleagues, notes that these potential uses and misuses are still just concepts.

After all, the study used DNA from only a limited set of people, with a decent but not fully representative level of diversity of ethnic backgrounds.

The DNA samples were initially collected to study human diversity among different populations and were anonymous. So the "matching" was between records in two databases and didn't find someone by name.

Still, the record matching the researchers achieved, and the data aggregation it could unlock, makes Li and his Stanford colleague Noah Rosenberg, Ph.D., interested in studying the approach further, in larger sets of DNA. Rosenberg and Li once worked together at U-M and have continued collaborating now that Rosenberg is at Stanford.

"We had a pond of less than 1,000 fish, and we put 'bait' in to see which fish would bite," Li says. "But in practice, if you want to search population-level data, you'd be doing the equivalent of looking for a needle in a haystack and determining if you have a match among millions of records."

Using the same technique in larger databases could lead to a number of moderately strong matches, rather than a surefire pinpointing of an individual.

But that could be enough to dramatically narrow a search for a suspect. As the CODIS database extends its sampling from 13 to 20 STRs, the potential precision of matches will increase.

Preventing unauthorized uses of DNA data and cross-matching of databases, Li says, could require research oversight boards, clinical providers and commercial DNA-testing companies to rethink their consent forms and security protections.

"In the end, record matching is a classification task, like matching a patient to a diagnosis," says Li. "It has to do with the granularity of the classification. But record matching across different databases is a medical care issue as well as a privacy issue. And the chance of being identified beyond your consented use of your DNA is a real possibility. We need to figure out where to draw the line."

Further applications

As the movement toward precision medicine gains steam, combining databases of DNA and the clinical data from the same patients could allow health researchers to identify genetic factors that make a person more likely to respond to certain treatments or have certain side effects.

SEE ALSO: Why Health Care Infrastructure Needs to Catch Up with Precision Medicine               

Li is also studying variation in cancer cells by looking at the massive amounts of data generated when researchers track which genes are being expressed, or which mutations have occurred, at different stages in a tumor's lifespan.

If they can connect the gene expression or mutation patterns with the clinical stage of the cancer, they might be able to understand better which vulnerabilities a tumor has, and therefore how to detect or treat it better.

Now that connecting haystacks and ponds has been shown to work, that potential grows ever stronger.


More Articles About: Lab Report Basic Science and Laboratory Research Genetic Testing All Research Topics
Health Lab word mark overlaying blue cells
Health Lab

Explore a variety of health care news & stories by visiting the Health Lab home page for more articles.

Media Contact Public Relations

Department of Communication at Michigan Medicine

[email protected]

734-764-2220

Stay Informed

Want top health & research news weekly? Sign up for Health Lab’s newsletters today!

Subscribe
Featured News & Stories rainbow colored dots on white background
Health Lab
Microscopic imaging without a microscope?
New technique visualizes all gene expression from a tissue.
colorful sequencing data processing on black background
Health Lab
New Technique Identifies Important Mutations Behind Lynch Syndrome
Approach could improve predictive value of genetic screening.
veteran sitting with hands closed with person writing notes across from them
Health Lab
Insights into factors contributing to suicide risk among veterans
Recent studies by Michigan Medicine IHPI experts offer new insights into factors contributing to suicide risk among veterans and opportunities to strengthen suicide prevention programs
yellow dices with different emotional faces on each side
Health Lab
Could personality tests help make bipolar disorder treatment more precise? 
Bipolar disorder treatment could become more precisely focused if guided by the results of personality tests that reveal personality styles, or combinations of personality traits.
mom and child with pacificer and older teen with pacifier in his mouth on other side of blue bench
Health Lab
Are parents waiting too long to stop pacifier use or thumb-sucking in kids?
A national poll looks at how parents tackle thumb-sucking and pacifier use —two habits that can benefit children but that can also be challenging to stop.
doctors in surgery room over surgery table with cooler open with labels on it
Health Lab
Why donor hearts fail in cold storage — and how to prevent it
Researchers have discovered a new molecular process that occurs when donor hearts are preserved in cold storage which contributes to failure after transplant, a study in both humans and animals shows. Fortunately, therapy that is typically prescribed for high blood pressure can target this process to reduce cold preservation associated with cardiac injury. This discovery has potential to improve the consistent function of donor hearts and extend the distance they can be safely transported in cold storage.