A group of researchers affiliated with Oxford University, UK, have developed a prognostic model backed by the most extensive information set spanning a national, population-representative, and linked electronic healthcare record data set of over 11·6 million female individuals of 20–90 years of age for clinical prediction of females who are at high risk of developing breast cancer as well as the likelihood of consequent mortality in a decade without breast cancer at baseline. This unprecedented and pioneering regression and machine learning modeling study concerning females without breast cancer at baseline aspires to encourage informed stratified screening or chemoprevention strategies and additional clinical intervention towards those individuals with a grave contingency of breast cancer associated with a higher mortality risk in 10 years, as distinct from basing clinical decisions solely on the risk of breast cancer diagnosis.

The Dilemma in Breast Cancer Risk Prediction 

Although breast cancer is life-threatening, with the increase in awareness, improved screening strategies, and treatment opportunities over recent decades, many countries have witnessed optimistic outcomes in patient survival. Detection of individualistic risk proneness would additionally aid in reducing the mortality rate emanating from breast cancer malignancy.

Crucially, prevention strategies structured around risk-stratified early detection do not always correspond to the likelihood of dying of breast cancer. Such foretold likelihood of cancer incidence leads to ineffective chemoprevention approaches and overdiagnosis, rendering unnecessary torments to the individual’s health and an accompanying financial burden.

A Glimpse Into the Prognostic Modelling Approach: Predicting Indivisualisitc Combined Risk of Developing Breast Cancer and its Lethality

The recent study published in Lancet Digit Health by the collaborative efforts of researchers from Cancer Research UK Oxford Centre; Nuffield Department of Primary Care Health Sciences; Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences; Department of Oncology and Nuffield Department of Population Health; University of Oxford; consider substituting the mere consideration of cancer incidence risk with the screening of female individuals having the greatest risk of breast cancer development that will progress towards mortality; to be a more effective early detection and prevention strategy to reduce breast cancer mortality. 

Accurate tools that identify female individuals at increased risk of developing life-threatening breast cancers could inform efficient targeting of individuals most likely to benefit from chemoprevention, novel screening approaches, or recruitment into trials. The effort to build a prognostic model that will reliably predict the 10-year risk of breast cancer fatality in female individuals without breast cancer at baseline chooses four models, among which two are regression approaches that include Cox proportional hazards and competing risks and two machine learning approaches including XGBoost and a feed-forward neural network. 

This research work is countenanced by the QResearch database, a large consolidated database derived from the anonymized health records of over UK’s 35 million patients who are currently registered as well as historical patients who have died or left and is also linked to secondary care and national cancer and mortality registers in England, UK. 

The study extracts the datasets of 11.6 million female individuals aged 20–90 years without previous breast cancer or ductal carcinoma in situ who entered the cohort between Jan 1, 2000, and Dec 31, 2020, while excluding Cohort participants with recorded existing or previous diagnoses of invasive breast carcinoma or ductal carcinoma in situ on general practice, HES, or cancer registry records. The primary outcome of the cohort was breast cancer-related death, defined by breast cancer being recorded as a primary or contributory cause of death on ONS death certificates. 

Additionally, other predictor cofactors like BMI, age, smoking status, diabetes types, psychotic conditions, benign breast disease, previous lung cancer, previous hematological cancer, chronic kidney disease, chronic liver disease, ischaemic heart disease, vasculitis, family history of breast cancer; use of antipsychotic medication, estrogen-only hormone replacement therapy, combined hormone replacement therapy, etc. were taken into consideration.

Furthermore, the Models were evaluated using an internal-external validation framework, which was previously applied by the same research team to develop and compare models predicting the 10-year risk of breast cancer mortality in women with invasive breast cancer. Internal–external validation can estimate how well a model might be generalizable to temporally or geographically different settings by simulating the same process (developing a model in one sample and applying it to a later, distinct sample).

The internal-external validation method is a more informative evaluation than assessing generalization to one randomly partitioned subset of data with similar characteristics. Among four strategies, Competing risks regression yielded a model that was deemed to be the most clinically useful since it had the highest discrimination ability, did not show any miscalibration in any age group, and, therefore, was associated with favorable net benefit across all age groups examined. 


This is the pioneering study to estimate the risks of breast cancer mortality in the general female population, including individuals beyond the current age-based screening eligibility criteria, and encompasses the most extensive data from a large sample size. Most importantly, the study is prospective in further personalizing and tailoring more effective clinical strategies to reduce the life-threatening possibility of breast cancer.

Some limitations of the study, as mentioned by the authors, include the exclusion of genetic risk estimates or mammographic density due to non-availability in the source datasets; sparse information on breast cancer lethality among some ethnic subgroups; reliance on individual healthcare practitioner coding for predictor variables or measurements and exclusive incorporation of UK-population data that may not generalize for other countries. The limitations will anticipatedly foster further study in strengthening the risk-assessment approaches. 

Article Source: Reference Paper

Learn More:

Website | + posts

Aditi is a consulting scientific writing intern at CBIRT, specializing in explaining interdisciplinary and intricate topics. As a student pursuing an Integrated PG in Biotechnology, she is driven by a deep passion for experiencing multidisciplinary research fields. Aditi is particularly fond of the dynamism, potential, and integrative facets of her major. Through her articles, she aspires to decipher and articulate current studies and innovations in the Bioinformatics domain, aiming to captivate the minds and hearts of readers with her insightful perspectives.


Please enter your comment!
Please enter your name here