Heart failure is widely recognized as a leading cause of mortality worldwide. Though there have been great strides made in medical technology over the past few decades, heart failure remains widespread, with 64 million people globally suffering from it. Though early symptoms do show up, heart failure is often undetected until it reaches an acute stage. The ability to identify heart failure at an early stage will allow for a reduction in mortality rates through early intervention and preventative care. A new challenge was announced, which asked its participants to build predictive models, providing new insights into the risk factors behind this deadly disease.

Multiple factors have been implicated as potential causes of heart failure. In recent years, human gut flora has inspired much intrigue as studies increasingly demonstrate its ability to influence bodily processes and the progression of disease. Gut microbiota has been demonstrated to influence oxidative stress, endothelial dysfunction, and inflammation, all of which impact the development of heart failure. Though these associations have been reported before, there haven’t been many studies at the population level on the gut microbiome and incidence of heart failure so far. However, it is possible that better comprehension of these complex processes surrounding the disease’s pathophysiology can lead to earlier identification and better therapeutic methods.

To motivate researchers to explore these potential links, an open challenge was proposed, which invited the submission of various models to identify the impact of microbiota on heart failure risk. Training, testing, and scoring datasets were obtained from FINRISK 2002, which presented population data from over 7000 Finnish individuals. Their performance was then evaluated using the scoring dataset, and the two best-performing models showed values that were similar to those seen in baseline models. These models identified Crenarchaeota and Chrysiogenetes as having a high association with the incidence of heart failure. These had comparatively lower read counts in the samples used, showing the necessity of further validation and testing for both models.

Certain networks of microbes, such as one containing species from Clostridia (whose members are known for the production of trimethylamine N oxide (TMAO), which has been linked to the consumption of dairy, eggs, and red meat), have been previously noted to have an association with the presence of inflammatory signals and type 2 diabetes. The network is also comprised of several opportunistic pathogens, like Eggerthella lenta. Another network, Network 1, had ten species from Coriobacteriia, of which those from the genera Collinsella have been previously associated with type 2 diabetes and in women who are overweight or obese.

When run without phyla information (which can introduce redundancy), the fit and performance of the best-performing model were improved marginally. All the models used were consistent in referring to factors such as age, BMI, sex, and blood pressure.

It was later investigated to find if predictions from various models could be combined to improve accuracy and prediction calibrations. Ensemble models were constructed using the means of the scores obtained from the individual models, which had better model performance and calibration scores.

It should be noted that this challenge used a very lax definition of heart failure; heart failure incidences based on stricter criteria were also available. Models were tested using these stricter criteria, and both the best-performing models had better predictions, though only one had a properly calibrated performance. The baseline and refined models were then evaluated using follow-up times of 10 and 5 years, respectively, in addition to the 15-year follow-up that was originally available. Risk prediction models at ten years showed improved performance and calibration, but these were negatively impacted in the models using the 5-year follow-up testing.

While the performance of the models did not lead to significant improvements when taxonomic profiles were included, the use of this data allowed for greater interpretability in the results due to the provision of crucial background information. The use of co-abundance networks at the species level also significantly improved model performance as compared to the baseline. The purpose of creating a competitive challenge format was for the purpose of gaining insights from diverse perspectives and fields. Such an approach allowed for the creation of highly optimized prediction models that could account for microbiome data to give greater insight into the risk prediction for heart failure.


Associations were observed between the occurrence of heart failure and a mix of various microbiome characteristics such as intra-individual diversity, taxonomic groups, and co-abundance networks. Intra-individual diversity had an inverse correlation with heart failure risk, thus contributing to theories that greater gut flora diversity contributes to better health and lower risk of heart failure specifically. Predictive co-abundance networks were shown to be positively correlated to inflammatory signals, suggesting that gut microbiota have a role to play in the pathophysiology of heart failure. In addition, it was observed that those networks that had positive associations with heart failure had also been noted to have links to type 2 diabetes and obesity, which are both risk factors for heart failure. Machine learning models often face the problem of having too many features to account for – in this case, compressing data with the use of co-abundance networks to restrict the number of features proved successful.

Further research on different feature selection strategies and optimization may result in better outcomes for the models. Further investigations also need to be performed to understand the cause of why models with a certain follow-up period outperformed others. Machine learning models need to be further utilized in order to understand these aspects more. As the models were optimized using only synthetic data, the link between microbiome data and clinical variables may be obscured. The lack of incorporation of external datasets may also limit the generalisability of the model. Despite this, the use of machine learning and computational methods to investigate the causal links between the gut microbiome and heart failure risk provided useful insights into various risk factors and feature engineering. Moving forward, these can help in achieving tangible conclusions, which can help in early diagnosis and therapeutic intervention for heart failure.

Article source: Reference Paper

Important Note: medRxiv releases preprints that have not yet undergone peer review. As a result, it is important to note that these papers should not be considered conclusive evidence, nor should they be used to direct clinical practice or influence health-related behavior. It is also important to understand that the information presented in these papers is not yet considered established or confirmed.

Learn More:

Website | + posts

Sonal Keni is a consulting scientific writing intern at CBIRT. She is pursuing a BTech in Biotechnology from the Manipal Institute of Technology. Her academic journey has been driven by a profound fascination for the intricate world of biology, and she is particularly drawn to computational biology and oncology. She also enjoys reading and painting in her free time.


Please enter your comment!
Please enter your name here