Massachusetts Institute of Technology (MIT) computer scientists developed iPGS, a model that gives accurate polygenic score calculations for the genetic makeup of individuals belonging to diverse ancestral backgrounds. Diversity in human beings can be attributed to the 0.1% of DNA that varies across individuals. It determines the phenotypes that would be expressed as well as the risk of contracting or developing certain diseases. The overall effect of gene variation can be assessed by the generation of ‘polygenic scores’ by statistical models. However, one major problem with such models is that the datasets used to train them have been largely based on people of European descent. ‘Admixed’ is a term used to describe individuals whose chromosomes have been inherited from a mixture of ancestries and populations that were previously isolated. To perform disease risk assessments for people in an inclusive manner and to ensure equal health opportunities for all regardless of their ethnic background, it is imperative to either upgrade the datasets used by existing models or to develop new ones entirely, which is exactly what scientists attempted to accomplish.

Scientists at MIT took the initiative to achieve this feat by developing a model that uses genetic information obtained from people who come from varying ancestries all around the world. The iPGS model can accurately predict the possibility of diseases caused by various expressed genetic traits, even for underrepresented communities. According to Manolis Kellis, a professor of computer science at the Computer Science and Artificial Intelligence Laboratory (CSAIL) of MIT, the model was 60% and 18% more accurate than previous models when making predictions for people that come from African and admixed ancestries, respectively. In addition to enhancing public health policies and results for those who are at risk, the researchers believe that widespread implementation of this approach will make genome sequencing a more widely available and accepted procedure worldwide. Public users and members of the scientific community can access the data collected for this study.

Contribution to the Human Genome Project 

The Human Genome Project (HGP), whose main objective is to map every gene found in the human genome in order to understand better genetic variation and the risks associated with developing diseases in humans, benefits from this work. The HGP began in 1990 and was completed in 2003.

Individual gene expression vs. collective gene expression

The studies carried out by MIT scientists using iPGS showed that individual genetic variants have very little effect compared to the combined effect of genes. A summation of all the effects inflicted on the human body by each genetic variant points toward the risk of developing various diseases and disorders, such as schizophrenia, stroke, heart disease, and diabetes, to name a few. Polygenic models have helped bring out these results.

The variation of variants across different regions 

Previously conducted genome-wide studies have not involved that many people from ancestries that are not of European descent. Genetic variation depends on factors such as population history, stochastic drifts, and environmental conditions. Taking the example of people that are of African descent, it has been observed that they possess genetic variants that give them greater protection against malaria compared to other populations. The effects of genetic variation reflect on other parameters as well, such as the neutrophil and leukocyte counts of the immune system across different populations.

There have been attempts to develop different models for different communities by essentially separating datasets for people of Asian and African descent, for example. These models usually separate individuals into groups based on their ancestry and then make predictions based on association summaries generated by the model. While these models do take more ethnicities into account, they still pose a problem when conducting risk assessments for people from admixed backgrounds.

Introducing inclusive PolyGenic Score (iPGS)

Instead of grouping individuals based on their ancestry, the authors used statistical and computational methods to analyze the genetic profile of each person. According to data retrieved from the UK Biobank dataset, people belonging to admixed backgrounds constitute nearly 10% of the country’s population. Additionally, about one in seven children born in the United States possesses admixed genetics.

The UK Biobank is a large-scale database used for research purposes, and it contains data on the genetic information of roughly 280,000 people. It contains biomedical information related to the lifestyle, health, and genetics of people living in the United Kingdom. In this study, 60 different traits were evaluated; the size and shape of bodies, height, and body mass index (BMI) are some of the physical traits that were evaluated; red blood cell (RBC) and white blood cell (WBC) counts were assessed for internal physiological traits. Even though the data for people belonging to African ancestries only made up 1.5% of the entire dataset, the prediction for diseases related to this background by iPGS showed the highest accuracy and improvement compared to previous models. For other ethnicities, 5% and 11% improvement was shown for people belonging to white British and South Asian ancestries, respectively. For admixed ancestries, predictions were improved by 18%. 

Researchers hope that iPGS results will help physicians more accurately diagnose diseases with observable symptoms. They also hope that combining iPGS results with traditional risk factors for diseases will improve the management of diseases that people are likely to develop or keep them from ever developing.

Future Improvements

Despite the promising potential that iPGS holds for disease risk assessment, the authors suggest it can be further fine-tuned. They suggested the integration of inferences from local ancestries of individuals to represent genealogical (tracing the lines of family descendants) relationships, the incorporation of data from single-cell and functional genomics to identify causal variants with greater accuracy and to facilitate the transferability of data, and finally, they recommended combining multiple cohorts of data from populations across the world.


Joint PGS models for analyzing genetic traits across shared ancestries promote inclusivity and will have long-term benefits for public health without any discrimination. Using an iPGS+refit model makes the transferability of information more feasible. The coefficients of iPGS are publicly available here.

Article source: Reference Paper | Reference Article

Learn More:

Website | + posts

Swasti is a scientific writing intern at CBIRT with a passion for research and development. She is pursuing BTech in Biotechnology from Vellore Institute of Technology, Vellore. Her interests deeply lie in exploring the rapidly growing and integrated sectors of bioinformatics, cancer informatics, and computational biology, with a special emphasis on cancer biology and immunological studies. She aims to introduce and invest the readers of her articles to the exciting developments bioinformatics has to offer in biological research today.


Please enter your comment!
Please enter your name here