Researchers from New York University, Carnegie Mellon University, and the New York City Health Department have joined forces to develop an automated machine-learning approach that detects new, upcoming health threats from local health department complaints and disease cluster data.

Diseases emerge from a variety of situations that, in turn, pose a huge threat to public health – One of the earliest coronavirus disease 2019 (COVID-19) outbreaks in the United States occurred in February 2020, affecting more than 50 residents of a nursing home in Kirkland, Washington. The right and effective detection programs would have prevented the rapid spread of myriad illnesses.

The Centers for Disease Control and Prevention’s National Syndromic Surveillance Program is implementing strategies to monitor the early detection of previously unnoticed health threats. The surveillance involves grouping individuals based on their symptoms. This Syndromic surveillance system keeps track of the complaint data, a text string mapping out illnesses frequently occurring in a particular area. Once a significant increase in cases is recognized, the novel event data is added to the system. This makes it challenging to ensure timely detection and prevention of such biothreats. 

To detect a novel health threat with previously unseen symptomology, instead of recognizing that something new and different is happening, previous algorithms would map these cases into syndromes, thus completely missing the signal. Therefore, the International Society for Disease Surveillance requires urgent innovation to detect unique, emergent bio-threats that do not correspond to currently established and monitored sickness categories. It has been named “pre-syndromic” surveillance. 

The common presyndromic surveillance approach employs a keyword-based strategy that contrasts word counts during a past occurrence with word counts reported during the most recent period. These techniques can identify odd word frequencies in the most recent data using various statistical methods, including likelihood ratio tests, Poisson tests, and Fisher’s exact hypothesis test. They can also report any occurrences of new keywords that were not previously reported in the complaints. But this keyword-based approach often fails due to their high false positive rates, making them unreliable. 

Therefore, to address this issue, Nobles et al. have developed a newer, improved presyndromic method called Multidimensional Semantic Scan (MUSES) with the cooperation of public health organizations. 

Major Findings

For presyndromic surveillance, MUSES is a data-driven, automated machine-learning solution that offers practitioners individualized, helpful decision support by:

1) Not requiring pre-established syndrome classifications.

2) Uses multidimensional scan statistics to locate localized case clusters, allowing the identification of new biothreats that may target particular patient demographic or geographic regions.

3) Employs a “practitioner in the loop” strategy, including user feedback, focusing on pertinent patterns, minimizing false positives, and giving regional users meaningful information based on their standards for what is and is not relevant.

Since keyword strategies include typos, the authors have included data cleaning and auto-correction of typographical errors as significant data processing steps. MUSES is a data analysis tool to identify unseen case clusters emerging in a subpopulation and aids practitioners in providing insights and promoting prevention. 

In order to compare the top 30 clusters of influenza-like respiratory illnesses in the original data with the top 30 clusters in the subsampled data, researchers ran MUSES on a 90% sub-sample of the original data collected between March and April 2020. With the same hospital, date, time of day, similar topics, and cases, 22 of the 30 clusters in each list matched a cluster in the other list. In contrast, the remaining clusters either narrowly missed the top 30 in the other list or MUSES identified those cases to be a part of another cluster during the same hour, therefore not affecting the speed or accuracy of the detection. 

Final Thoughts

The authors recommend using MUSES in addition to current procedures like notifiable disease reporting and syndromic surveillance rather than as a replacement because these existing approaches would be more effective for identifying patterns of known disease types and frequently occurring syndromes. While the health practitioners and government public health resources were occupied in the COVID response during the peak, for COVID patients, presyndromic surveillance would have been a more targeted approach to identifying new, emergent symptoms.

Presyndromic surveillance is a crucial next step for better public health practice because it has the potential to improve daily situational awareness, enable early detection of emerging biothreats during an emergency, and provide a “safety net” to identify and investigate newly emerging and previously unseen events that existing systems would fail to detect.

Story Sources: Reference Paper | Reference Article

Learn More:

Top Bioinformatics Books

Learn more to get deeper insights into the field of bioinformatics.

Top Free Online Bioinformatics Courses ↗

Freely available courses to learn each and every aspect of bioinformatics.

Latest Bioinformatics Breakthroughs

Stay updated with the latest discoveries in the field of bioinformatics.

Website | + posts

Shwetha is a consulting scientific content writing intern at CBIRT. She has completed her Master’s in biotechnology at the Indian Institute of Technology, Hyderabad, with nearly two years of research experience in cellular biology and cell signaling. She is passionate about science communication, cancer biology, and everything that strikes her curiosity!


Please enter your comment!
Please enter your name here