Researchers from MIT CSAIL employed a potent deep-learning model to extract crucial information from electronic health records that might help with personalized medicine.
The adoption of electronic health records aims to enhance and simplify healthcare. Beyond the purview of clinical trials, the vast quantity of data in these now-digital records might be utilized to address the following very specific questions: What dosage of this drug is appropriate for people of this height and weight? What about individuals who have a certain genetic profile?
Evidently, the majority of the information that may provide the answers to these queries is buried in complicated doctor’s notes. Current methods make it difficult for computers to comprehend these notes; in order to extract information, several machine-learning models must be trained. A time-consuming and expensive procedure, building each model requires domain specialists to categorize large amounts of data. Models created for one hospital also don’t operate well at others.
In this study, it is demonstrated that despite not being trained explicitly for the clinical domain, big language models, such as InstructGPT, function well at zero and few-shot data extraction from clinical literature. Although the performance of text classification and generation in such models has already been thoroughly studied, this study shows how to use them to handle a variety of NLP tasks that demand more structured outputs, such as span identification, token-level sequence classification, and relation extraction. Provision of additional datasets for benchmarking few-shot clinical information extraction based on a human reannotation of the CASI dataset for new tasks due to the lack of data available to test these systems. The GPT-3 systems perform noticeably better on the clinical extraction tasks that have been looked at than conventional zero- and few-shot baselines.
EHR – The reservoir of all codes
Large language models (LLMs) have been slowly evolved by experts for a while, but GPT-3’s widely reported capacity to finish sentences propelled them into the public eye. To complete sentences and foretell the next most probable word, these LLMs are conditioned on a sizable volume of content from the internet.
While older, more compact methods, such as early GPT versions or BERT, have performed admirably for retrieving medical data, they still require a significant amount of manual data labeling.
Understanding through an example
Let’s suppose a doctor has scribbled this short-hand note to a patient, “pt will dc vanco due to n/v”
This indicates that while the patient was taking the antibiotic vancomycin, the care team decided to stop the treatment because the patient’s nausea and vomiting were too severe. The research team does away with the practice of developing unique machine-learning models for each assignment (extracting medication, side effects from the record, disambiguating common abbreviations, etc.). They looked into four additional objectives in addition to augmenting abbreviations, such as whether the models could interpret clinical studies and extract detailed prescription prophylaxis.
According to Sadid Hasan, AI lead at Microsoft and former executive director of AI at CVS Health, “Clinical information buried in unstructured clinical notes has unique challenges compared to general domain text, mostly due to large use of acronyms, and inconsistent textual patterns used across different health care facilities.” In order to do this, this paper presents an intriguing paradigm for using general domain big language models for a number of significant zero-/few-shot clinical NLP tasks. By repeatedly using the model-produced pseudo-labels, the suggested guided prompt design of LLMs to provide more structured outputs might lead to the further creation of smaller deployable models.
Large language models have evident limits, despite their considerable potential for clinical information extraction.
First, because clinical annotation requirements are sometimes many pages long, it is currently challenging to direct an LLM to follow a precise schema. The resolved GPT-3 outputs did not always match at the token level, even when they had good qualitative results. One resolved GPT-3 report for tagging durations, for instance, read “for X weeks” rather than “X weeks.” Although this particular mistake is trivial, it demonstrates how difficult it is to communicate complex principles.
Second, identifying a bias in GPT-3 that produces a complicated response even when there isn’t one. Create a list of the drugs specified and show whether they are active, discontinued, or neither, such as in the prompt used for medication extraction. The two distinct prompts, “Create a bulleted list of active drugs, if any,” and “Create a bulleted list of discontinued medications, if any,” were tried out before this one. The respective LLM results would be accurate if there were just one current drug and one stopped medication. However, if there were two active prescriptions and none of them were stopped, the LLM primed with the discontinuation prompt tended to try to discover output and typically resorted to identifying one or more active drugs.
It is important to design assignments or suggestions that do not fall into this trap.
This might be accomplished by
(i) chaining many prompts together, for as by first asking if a certain entity type is there in the input before requesting a list
(ii) utilizing a sequence tagging-style output structure.
Finally, all activities other than extracting biological evidence were generated from the publicly-available CASI dataset because of the data usage limitations on the majority of current clinical datasets, which preclude openly sharing the data (for example, to the GPT-3 APIs). The clinical data in CASI was compiled from notes from a variety of hospitals and specializations, but it in no way represents all clinical material. For example, though the method is not used every time, the CASI analysis claims that the notes were “mainly verbally spoken and typed.” Additionally, only English was assessed, as is regrettably usual in clinical NLP, allowing future research to examine LLMs’ proficiency in other languages.
Ethical concerns regarding LLM
The datasets solely included novel annotations over already-existing, freely downloadable clinical literature. It is acknowledged that findings concerning individual performance cannot be transferred to other languages, hospital systems, or temporal circumstances, but these new annotated datasets serve as a starting point for the evaluation of LLMs on clinical text (as clinical text is quite subject to dataset shift). There are significant potential advantages if big language models are incorporated into clinical extraction workflows. Clinical trial groups are often limited and hand-curated because clinical material is being produced at a scale that is far too great for manual annotation.
The analysis of real-world evidence for subpopulations that may not have been seen in clinical trials, as well as the study of rarer or less funded diseases, would both be made possible by the automatic structuring of clinical variables. This would help catalyze research that might otherwise be impractically expensive. The performance of such a system must be assessed in a high-stakes setting, and the performance numbers must be stratified by cohorts of note (such as racial, socioeconomic, patient comorbidities, disease stage, site of care, the author’s clinical role, and seniority); these variables were absent from the referred data. Additionally, it is essential that the performance numbers be evaluated in the same environment in which they will be used.
Freely available courses to learn each and every aspect of bioinformatics.
Stay updated with the latest discoveries in the field of bioinformatics.
Riya Vishwakarma is a consulting content writing intern at CBIRT. Currently, she's pursuing a Master's in Biotechnology from Govt. VYT PG Autonomous College, Chhattisgarh. With a steep inclination towards research, she is techno-savvy with a sound interest in content writing and digital handling. She has dedicated three years as a writer and gained experience in literary writing as well as counting many such years ahead.