Scientists from the National Library of Medicine (NLM), National Institutes of Health (NIH), and University of Maryland, College Park, US, have presented GeneGPT, a Large Language Model (LLM) that is trained to use the Web APIs of the National Center for Biotechnology Information (NCBI) for generating answers to genomic questions. This pioneering attempt to intercalate LLMs with domain-specific Web API tools aspires to diminish hallucinations of LLMs and serve specialized knowledge of biomedical fields. GeneGPT executes state-of-the-art performance on eight tasks in the GeneTuring benchmark with an average score of 0.83, largely surpassing retrieval-augmented LLMs, including Bing, biomedical LLMs such as BioMedLM and BioGPT, as well as GPT-3 and ChatGPT.

Tackling LLM Hallucination Phenomenon: Exploiting Web APIs 

Large Language Models (LLMs) are trained on vast text data and are capable of comprehending prompts and generating responses in human-interpretable language. LLMs revolutionized the Artificial Intelligence sector and are now attaining heights of popularity for performing multifarious tasks. However, reports of LLMs formulating false responses are also ubiquitous.

The tendency of LLMs to generate erroneous knowledge and speak fabricated information while sounding rational is now called Hallucination. This is the critical issue associated with LLMs, which often misguide users and render inappropriateness while extracting specialized details, especially for biomedical data. Although some LLMs have showcased superior performance in generating biomedical data, since there is no credible source for these models to consult the truth, there remains the propensity to suffer from hallucinations.

Therefore, LLMs rigorously demand refinement. Many journals before have pinpointed this concerning issue and had drawn suggestions to circumvent the hallucination. Augmentation of LLMs by conditioning them on retrieved relevant content or allowing LLMs to utilize other external tools such as program APIs (Application Programming Interfaces) are some practical solutions that were recommended.

The recent publication by NIH researchers harnessed the latter proposition, where LLMs are augmented with domain-specific tools such as database utilities for easy and more precise access to specialized knowledge in order to confront hallucinations, perhaps emanating from contradictory and incomplete training data sources. Therefore, it has endeavored to teach LLM to utilize Web APIs, which allow communication between other services or systems following web protocols.

GeneGPT: Answers Biomedical Questions 

The enthusiasm and inquisitiveness related to devising a tool that answers queries and explains complex biomedical topics with precision and accuracy have surged after the recent popularity of GPTs. Well, this study officially pioneers to justify the hype and steps forth to enrich biomedical sciences through the formulation of GeneGPT, a novel method that prompts Codex to use NCBI Web APIs by in-context learning. 

The work elucidates an unprecedented strategy for teaching LLMs to use NCBI API to respond to biomedical questions. GeneGPT consists of two main modules, including a specifically designed prompt that consists of documentation and demonstrations of API usage and an inference algorithm that integrates API calls in the Codex decoding process.

NCBI (National Centre Biotechnology Information), maintained by the NLM, NIH provides API access to its entire biomedical databases and tools, including Entrez Programming Utilities (E-utils) and Basic Local Alignment Search Tool (BLAST) URL API. Therefore, it enables convenient access to accurate biomedical data. 

E-utils API accesses the Entrez portal that covers 38 NCBI databases of biomedical data such as genes and proteins. The BLAST API allows users to submit queries to assess similarities between nucleotide or protein sequences to existing databases using the BLAST algorithm on NCBI servers. Therefore, most importantly, Web APIs discard the requirement of locally implementing functionalities, handling & maintaining large databases, and extensive computational steps; only having an internet connection can suffice.

The implementation of Codex fetches advantages because it is pre-trained with code data and shows better code understanding abilities, which is crucial in generating the URLs and interpreting the raw API results, and its API has the longest (8k tokens) context length among all available models. 

The LLM is taught to use NCBI Web APIs through in-context learning with an engineered prompt comprising four modules, including an instruction, API documentation, API demonstrations, and a question. The first three parts are fixed for all tasks, while the last one will vary according to tasks. 

Furthermore, during the GeneTuring study, a question-answering (QA) benchmark for genomics, constituting four modules of tasks such as gene alias, gene location, gene function, and sequence alignment tasks, when compared with Bing, ChatGPT, BioGPT, and BioMedLM; the GeneGPT have shown superior performance all tasks except for the gene location task. 

GeneGPT Bridges the Gap Between LLMs and Biomedical Information
Image Description: Left: GeneGPT uses NCBI Web API documentation and demonstrations in the prompt for in-context learning. Right: Examples of GeneGPT answering GeneTuring and GeneHop questions with NCBI Web APIs. Image Source:

The experimental results show that GeneGPT achieves superior performance on eight out of nine tasks in the GeneTuring benchmark with an average score of 0.83. GeneGPT is also able to generalize to longer chains of API calls and perform chain-of-thought API calls to answer multi-hop genomics questions in GeneHop, a novel dataset presented in this work containing three new multi-hop QA tasks based on the GeneTuring benchmark. 


GeneGPT teaches LLMs to use NCBI Web APIs and demonstrates excellence in GeneTuring tasks. The researchers articulate that API demonstrations have good cross-task generalizability and are more useful than documentation for in-context learning. GeneGPT’s capability to generalize to longer chains of thought gives it an extra edge of flexibility and usefulness for real-world applications. Overall, the results indicate that database utility tools are promising in augmenting LLMs to faithfully serve various biomedical information needs. 

Article Source: Reference Paper

Important Note: arXiv releases preprints that have not yet undergone peer review. As a result, it is important to note that these papers should not be considered conclusive evidence, nor should they be used to direct clinical practice or influence health-related behavior. It is also important to understand that the information presented in these papers is not yet considered established or confirmed.

Learn More:

 | Website

Aditi is a consulting scientific writing intern at CBIRT, specializing in explaining interdisciplinary and intricate topics. As a student pursuing an Integrated PG in Biotechnology, she is driven by a deep passion for experiencing multidisciplinary research fields. Aditi is particularly fond of the dynamism, potential, and integrative facets of her major. Through her articles, she aspires to decipher and articulate current studies and innovations in the Bioinformatics domain, aiming to captivate the minds and hearts of readers with her insightful perspectives.


Please enter your comment!
Please enter your name here