Pancreatic ductal adenocarcinoma (PDAC) is an extremely aggressive cancer with a poor prognosis, owing mostly to late diagnosis. Expanding screening beyond the 10% now eligible due to genetic susceptibility is critical for early detection and enhanced survival. MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) scientists, alongside Limor Appelbaum, a staff scientist in the Department of Radiation Oncology at Beth Israel Deaconess Medical Center (BIDMC), created and tested two models (PrismNN and PrismLR) for predicting PDAC risk in the general population using EHR data from a federated network of 55 US healthcare organizations. Both models accurately predicted PDAC risk 6-18 months before diagnosis, well beyond current clinical standards. Their study indicated how Prism models can transform PDAC detection and enhance patient outcomes.
PDAC is the fourth highest cause of cancer-related death in the United States, with a five-year relative survival rate of just 11%. This bleak prognosis is mostly attributable to delayed diagnosis, with more than 80% of cases presenting in severe stages. Current screening recommendations focus on those with a high genetic risk (family history or germline mutations), restricting eligibility to about 10% of the population. Expanding screening to the whole community is crucial for early discovery and possible curative treatment. Existing imaging modalities, such as endoscopic ultrasonography (EUS) and MRI/MRCP, are costly and resource-intensive, limiting their viability for widespread use. Furthermore, the dearth of reliable and non-invasive biomarkers complicates identifying high-risk individuals for targeted screening.
The first known case of pancreatic cancer stems from the 18th century. Since then, researchers have embarked on a long and difficult journey to better understand this mysterious and lethal disease. To date, early intervention is the most effective cancer treatment. Unfortunately, due to its location deep within the abdomen, the pancreas is particularly difficult to detect early on. To address these problems, scientists from different places aimed to identify high-risk individuals for targeted screening or lower-cost testing by integrating large clinical data from a federated network of 55 US healthcare facilities, resulting in earlier diagnosis and better survival.
The two models, the “PRISM” neural network and the logistic regression model (a statistical methodology for probability), surpassed existing methods. The team’s research revealed that while current screening criteria discover roughly 10% of PDAC cases with a five-fold higher relative risk threshold, Prism detects 35% of PDAC cases at the same threshold.
- Models were developed using electronic health record (EHR) data from 55 US healthcare organizations, which included demographics, diagnoses, prescriptions, and lab findings.
- PrismNN and PrismLR were created to provide a complete examination of various techniques for PDAC risk prediction using the same EHR data.
- Interpretability remains an important factor in gaining physician trust, and the authors acknowledge progress in making deep neural networks such as PrismNN more transparent.
Methods Redefining the Landscape of Pancreatic Cancer Risk Prediction
Data Sources and Processing
The researchers used EHR data from the TriNetX network, which included de-identified electronic medical information from over 89 million individuals across a variety of healthcare settings. A collected dataset of 38,928 PDAC patients and 1,500,081 controls matched on age, sex, and calendar year was used. Demographic information, diagnosis codes, medicine prescriptions, and laboratory test data were all used to extract features. Missing values were interpolated in a multi-step method to reduce bias.
Model Development and Validation
Scientists created two models: PrismNN, a deep neural network that can capture complicated associations in data, and PrismLR, a simpler logistic regression model for interpretability. Both models were trained on 70% of the data and internally validated on the other 30%. They also performed three types of external validation: location-based (various institutions), race-based (multiple ethnicities), and temporal (different periods).
The model’s performance was evaluated using the Area Under the Receiver Operating Characteristic Curve (AUC), calibration plots, and sensitivity/specificity at various risk levels. Simulated deployment was used to assess model effectiveness in a real-world clinical scenario. Researchers monitored PDAC risk for each individual over time using Prism predictions and assessed sensitivity and positive predictive value (PPV) at various risk levels.
- Prism models (PrismNN and PrismLR) performed well in predicting pancreatic ductal adenocarcinoma (PDAC) risk 6-18 months before diagnosis.
- PrismNN (neural network) produced an area under the curve (AUC) of 0.826 (95% CI: 0.824-0.828), whereas PrismLR (logistic regression) achieved an AUC of 0.800 (95% CI: 0.798-0.802).
- Prism functioned well across varied populations, with average internal-external validation AUCs of 0.740 for locations, 0.828 for races, and 0.789 (95% confidence interval: 0.762-0.816) for time.
- Simulated deployment demonstrated that PrismNN could identify 3.5 times more PDAC cases than existing screening criteria at comparable risk levels.
- Prism models used a minimal amount of features for prediction, making it easy to understand the model’s rationale.
PrismNN and PrismLR produced good AUCs for PDAC risk prediction, with PrismNN marginally outperforming (0.826 vs. 0.800). In external validation, both models demonstrated acceptable accuracy across varied groups, demonstrating generalizability. Prism models exceeded current clinical recommendations by identifying more than 35% of PDAC cases as high-risk at a SIR of 5.10, compared to only 10% using existing criteria. Simulated deployment simulations show that PrismNN could detect high-risk persons with high sensitivity while maintaining adequate specificity.
Strengths and limitations
- Extensive federated network data with varied patient groups.
- There are three forms of internal-external validation and simulated deployments.
- Potential for seamless clinical integration via a federated network.
- Retrospective study requires prospective validation for final clinical use.
- Potential data bias due to underrepresented racial groups.
- Geographic data diversity is limited (just in the United States).
- There is a need for improved model interpretation.
Researchers gathered data from millions of patients to create Prism, a model that analyzes medical records such as blood tests and diagnoses to determine who is most likely to acquire pancreatic cancer. Here’s some good news:
- Early Detection Power: Prism is extraordinarily accurate, detecting up to 3.5 times more instances at an early stage than existing approaches. This results in increased chances of survival.
- Broad Reach: The approach applies to a wide range of populations, regardless of race, region, or time of collection. This means more equitable and effective screening for everybody.
- Simple and Practical: Prism is based on widely available information from electronic health records, making integrating into existing healthcare systems simple.
Prism marks a significant advancement in the fight against pancreatic cancer. Its capacity to reliably forecast risk in the general population and the possibility of simplifying screening and early detection paints a more promising future for many patients. As research advances and Prism is further validated, we get closer to a scenario in which pancreatic cancer loses its stealthy advantage, paving the path for higher survival rates and, eventually, saving lives. It demonstrated the power of AI in healthcare, serves as an icon of hope in the fight against pancreatic cancer, and represents a significant step toward a future when early diagnosis genuinely matters.
While more research is required before Prism becomes standard, this breakthrough provides a ray of hope for early pancreatic cancer identification. Consider a world in which a simple checkup may indicate your risk and enable preventative actions, saving countless lives. Stay tuned for the story of Prism, as it is just the beginning.
Anchal is a consulting scientific writing intern at CBIRT with a passion for bioinformatics and its miracles. She is pursuing an MTech in Bioinformatics from Delhi Technological University, Delhi. Through engaging prose, she invites readers to explore the captivating world of bioinformatics, showcasing its groundbreaking contributions to understanding the mysteries of life. Besides science, she enjoys reading and painting.