The three billion letters in each of our genomes serve as the biological instruction book throughout our lives. It is a record of our previous generations—the meetings, matings, and movements of our ancestors. The richness of diversity in the global human population is a result of many factors, including DNA recombination, natural selection, and migration. We are all descendants of a common African ancestor, and we have vastly more genetic similarities than differences across diverse racial and ethnic backgrounds. However, the underlying genetic differences that do exist can have a significant effect on a specific population’s risk of developing various diseases, and influence the age at which diseases manifest and how they progress. Importantly, associations found in one population for a disease or complex trait may not be valid for another population. Thus, to ensure that the benefits of genetic studies are shared by all populations, a deliberate effort is required to genotype and whole-genome sequence large, diverse cohorts of individuals from non-European ancestries.

The scope

A 2009 study published in Trends in Genetics reported that 96% of 1.7 million participants in genome-wide association studies (GWAS) were of European descent.1 GWAS studies are widely used, and its goal is to examine single nucleotide polymorphisms (SNPs, single letter misspellings in our DNA) across thousands of people’s entire genomes to figure out whether certain DNA misspellings are associated with a disease or condition. These associations help the research community untangle the genetic architecture of diseases, and uncover the biological mechanisms underlying common conditions e.g., diabetes, schizophrenia, Alzheimer’s disease, and many others. These 2009 findings spurred a cautionary message: Genomic medicine must be diversified or it will only benefit the privileged few.

Genomic medicine must be diversified or it will only benefit the privileged few.

The above study was repeated and published in 2016 again indicating a disproportionate representation of Europeans in GWA studies (81% of all participants) despite Europeans making up only 16% of the global population.2 The authors note that the decrease in European representation from 96% in 2009 to 81% in 2016 is due to the increased number of studies being performed in Asian ancestry populations. Yet, the extent to which African, Hispanic and Latin American, Arab and Middle Eastern, and indigenous individuals are represented in GWAS has barely shifted.2 For example, from 2009 to 2016, the proportion of Asian ancestry participants in GWAS studies increased from 3% to 14% while: 1) African representation increased from 0.57% to 3%; 2) Hispanic and Latin American representation increased slightly from 0.06% to 0.54%; 3) Arab and Middle Eastern representation started at 0% and increased to a dismal 0.08%; and 4) studies involving Native peoples decreased slightly from 0.06% to 0.05%.2 Over the 35 million samples analyzed, individuals of African and Latin American ancestry, Hispanic, and indigenous peoples represented less than 4% of all cases studied. In 2019, a similar study published in Cell demonstrated that the proportions of individuals included in GWAS studies were 78% European, 10% Asian, 2% African, and 1% Hispanic, with all other ethnicities representing <1%.3 This gross lack of growth in representation of the world’s most vulnerable and traditionally underserved populations is profoundly disheartening and problematic, and it highlights the perpetuation of historical and systemic biases in science and medicine.

The impact

The underrepresentation of diverse populations in large-scale genetic studies thwarts our ability to fully understand the genetics of human disease, and leads to the exacerbation of health disparities. Further, the lack of diversity in these initiatives means that our ability to translate these findings into clinical practice and public health measures is critically incomplete or worse, inaccurate.

A 2019 article in The Scientist exemplifies the devastating and very real consequences of European-centric genetics on an individual level. The article covered a couple seeking help from a geneticist with the hopes of identifying why their child is suffering with an undiagnosed disease. Because the child is of non-European ancestry, relevant genetic studies of rare mutations (or genetic alterations) in the child’s DNA are limited, since most reference datasets are built from data derived from individuals of European descent.4 This not only constitutes an additional challenge to identifying the child’s underlying genetic defect but may affect the child’s chances of getting the appropriate therapeutic interventions in time.  

In common disorders, SNPs are often not informative so instead, a polygenic risk score (PRS) is calculated. A PRS is a weight sum of the risk SNPs an individual carries that can tell us if someone has high or low genetic risk for a disease/condition. Note that PRS faces the same issues as GWA studies since PRS is calculated from GWA studies. Recently, PRS has become of interest to the clinical community due to its ability to predict how likely someone is to get a certain disease based on their genetics. For example, PRS alone has been shown to predict the risk of breast cancer and prostate cancer in individuals of European descent more accurately than current clinical models.5,6 Due to Eurocentric biases, clinical uses of PRS will afford greater improvements in diagnosis, prognosis, and likely the development and availability of treatments for folks of European ancestry, further exacerbating health disparities in underrepresented communities.7 This major ethical and scientific challenge must be addressed if the hope of using PRS in the clinic to counsel patients is to be realized to its fullest potential.

Moving forward

Since the overarching goal of the scientific community is to comprehensively understand disease, it is crucial that we move beyond a Eurocentric genetic model. In recent years, the research community has begun to recognize the lack of diversity in cohorts recruited for genetic studies and its consequences. This has led to the development of several initiatives aimed at addressing this imbalance in representation. For example, in July 2011, commercial genetic testing company, 23andMe launched the “Our Roots into the Future® Project” which is aimed at accelerating research on genetics and disease in African Americans. After genotyping ~11K samples, they were able to replicate 44/258 associations with BMI, height, lupus, osteoporosis, type 2 diabetes, and migraines that were previously identified in a broad range of populations (a significant portion of which were European), see here. Next, 23andMe has also been actively recruiting research participants from 20 understudied countries, including Mali and Tajikistan ( Further,in 2018, the National Institutes of Health launched a program called All of Us ( with the goal of creating a database of one million diverse participants’ health records, including genetic data that can be used to study individuals from diverse backgrounds.

…we must do all that we can to ensure the people who are the most in need are not the last to get quality healthcare.

As the cost of genetic studies is decreasing due to advancements in high-throughput sequencing technologies, we must continue to hold ourselves accountable for expanding genetic datasets to include a larger proportion of racially and ethnically diverse populations. While it is tempting to focus on populations that are “easy to study” because they are motivated and medically compliant, we must not shy away from developing methods and resources to connect with underserved populations. Creating an equitable medical system for all means that we must do all that we can to ensure the people who are the most in need are not the last to get quality healthcare.



  1. Need, A. C. & Goldstein, D. B. Next generation disparities in human genomics: concerns and remedies. Trends Genet. 25, 489–494 (2009).
  2. Popejoy, A. B. & Fullerton, S. M. Genomics is failing on diversity. Nature vol. 538 161–164 (2016).
  3. Sirugo, G., Williams, S. M. & Tishkoff, S. A. The Missing Diversity in Human Genetic Studies. Cell 177, 1080 (2019).
  4. eager, A. (2019, March 21). Lack of Diversity in Genetic Datasets is Risky for Treating Disease. The Scientist.
  5. Maas, P. et al. Breast Cancer Risk From Modifiable and Nonmodifiable Risk Factors Among White Women in the United States. JAMA Oncol 2, 1295–1302 (2016).
  6. Schumacher, F. R. et al. Association analyses of more than 140,000 men identify 63 new prostate cancer susceptibility loci. Nat. Genet. 50, 928–936 (2018).
  7. Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).