N.I.H. Announces World’s Largest Integrated Health Database

A research program at the National Institutes of Health released the world’s largest database of human genomes and paired them with clinical data, officials announced Tuesday, paving the way for a new era of study in personalized medicine.
The All of Us program, which started in 2018, recruits participants from diverse backgrounds and combines their genetic information with real-world data from health records, wearable technology like Fitbits and other sources to help scientists investigate potential causes of and treatments for disease.
As of Tuesday, more than 747,000 people across the United States had contributed data, including 535,000 whole genomes linked to 482,000 electronic health records comprising doctor’s notes, diagnoses and testing results. The database also bundles the genetic information with health surveys about socioeconomic factors and location-based exposure data, such as air quality.
By comparison, the UK Biobank — which started in the early 2000s and is widely considered the leading genomic repository — contains genomes and electronic health records for about 500,000 participants, but it is almost entirely composed of people with white European ancestry, limiting the clinical implications for other groups.
“One of the most exciting components is its sheer diversity,” Alicia Martin, a statistical geneticist at the Broad Institute who already uses data from All of Us to build and test risk prediction tools, said of the U.S. records. That database, she said, “offers forward-thinking opportunities to try to understand not just who is at risk of disease, but also who is more likely to progress or have some exacerbated health condition, and who is going to respond to specialized treatments.”
The milestone for All of Us takes place as one of its major funding streams, the 21st Century Cures Act, is set to expire at the end of this fiscal year. The program’s budget has already been reduced by 72 percent since 2023. A group of more than 50 medical organizations sent a letter to members of Congress this month warning that, without a new funding mechanism, a significant amount of what had been built could be lost.
For decades, genetic research was largely conducted in a vacuum, separate from research into other environmental and health factors. But as modern health care aims to tailor treatment to an individual’s background and lifestyle, health officials have wanted a database that layers biological data, allowing scientists to study how disease manifests in a more comprehensive way.
The ultimate goal of the program is to recruit at least one million volunteers to provide data over a span of at least a decade, revealing how genes interact with everything from sleep patterns to geographic location. To date, the trove includes over 1.3 billion genetic variants and has been used to help build multiple genetic tests, like one that predicts inherited risk of various cardiovascular conditions and another, currently in clinical trials, that could improve early detection of prostate cancer.
Because the program is relatively new — and operating in a country with a fragmented health care system — the depth and history reflected in patients’ electronic health records may not be as robust as older, nationalized databases like the UK Biobank, Dr. Martin said.
Still, its strength is in its comprehensiveness. More than 86 percent of participants in the database come from groups that have been historically overlooked in biomedical research, according to officials, including racial and ethnic minorities, rural populations and those with disabilities. The data has already helped reveal gene variants that reduce the risk of kidney disease in people of African ancestry, for example.
Scientists have long sought to investigate environmental hazards and diseases that most often affect marginalized groups. But because of the disjointed nature of America’s medical infrastructure, it has been nearly impossible to gather data sets this large.
“You might have work at Vanderbilt, you might have it in New York at Mount Sinai,” Dr. Martin said, “but bringing it all together is the unique value here.”