OncoGPT is a customized language model created by Shenzhen Kanghua Juntai Biotech Co. Ltd. researchers in China that is intended to deliver precise medical recommendations for queries pertaining to oncology. Refined from more than 180K physician-patient exchanges, it offers precise guidance on questions about cancer. With its ability to alleviate the burden on healthcare providers and enhance patient access to information, especially in resource-poor places, OncoGPT has promise as a professional consultation tool. A potential first step toward improved AI-powered medical support for cancer professionals and patients is OncoGPT.

The Need for Specialized Medical AI

ChatGPT and similar large language models (LLMs) reveal AI’s aptitude for understanding and generating human speech. They learn by ingesting vast datasets of texts, allowing them to participate in conversational exchanges on diverse topics. However, their current understanding of medical topics is far from comprehensive. When patients inquire about niche health issues, these models struggle to respond reliably. This is especially true in complex disease areas like oncology, where guidance is sorely needed due to rising cancer rates worldwide. However, the scarcity of quality cancer question-answer data has hindered the development of capable AI assistants in this specialty. Closing this gap could make specialized medical advice more accessible to all.

Constructing the Right Training Data

The first milestone was compiling relevant data to train AI systems about cancer care conversations. By collecting English and Chinese records from patient-doctor websites, a dataset of over 180,000 oncology dialogues was created.

These real-world exchanges underwent meticulous processing, including:

  • Removing exchanges lacking questions or answers
  • Anonymizing records by deleting personal details
  • Manual review by cancer specialists to correct errors
  • Sorting queries into fundamental or treatment-related buckets

With a robust dialogue dataset secured, the next phase focused on leveraging it to develop an oncology-specialized LLM. However, raw data alone cannot impart complex medical knowledge. The next step was choosing and refining the right AI.

Building a Cancer-Focused AI Assistant

The publicly available LLaMA architecture was selected as a starting point for development. With comparable capabilities to OpenAI’s GPT-3.5, it provided a flexible foundation upon which to build.

The model was first trained on general conversational patterns and basic medical logic. Further fine-tuning exclusively on the 180,000 oncology records produced an AI assistant tailored for cancer care. This multi-phase training approach enabled it to adopt the nuances of doctor-patient interactions within this specialty.

Extensive experiments were run to optimize the model’s configurations for maximizing the human-likeness of responses to cancer queries. After two weeks of intensive tuning, OncoGPT was created – an AI assistant specialized in addressing oncology-related questions.

Testing OncoGPT’s Cancer Care Capabilities

But how could OncoGPT’s proficiency be measured objectively? Rigorous testing methodologies objectively evaluated OncoGPT against baseline models on a reserved subset of 737 oncology conversations.

On every metric – precision, recall, F1 score – OncoGPT demonstrated clear improvements in providing accurate, relevant answers to patient questions. It maintained superior quality, responding to both fundamental and treatment-focused queries.

These rigorous tests verified that tuning AI on specialty medical dialogues can markedly enhance domain knowledge. For cancer care concerns, OncoGPT exhibited more reliable comprehension than its generic counterpart.

Advancing Toward Real-World AI-Assisted Cancer Care

In limited testing environments, OncoGPT shows immense promise for strengthening cancer care access and quality through AI. However, achieving real-world preparedness requires addressing additional considerations around safety, ethics, and practical integration.

As the team works to expand OncoGPT’s knowledge foundation and reinforce answer accuracy, scalable applications are envisioned across the cancer care continuum:

  • Patient education: Providing accessible, understandable disease information
  • First-line support: Helping patients with self-triage concerns
  • Consultation aid: Assisting overburdened oncologists
  • Remote care: Enabling care in geographically isolated communities


If rigorously validated, cancer-focused conversational AI could help democratize expertise, connecting vulnerable patients with tailored guidance. As global cancer rates rise, technologies like OncoGPT may soon become invaluable tools for augmenting care capacity.

Of course, specialized medical AI comes with caveats. Transparency, oversight, and two-way communication with human experts must remain priorities to ensure safe, ethical integration. However, the progress in developing OncoGPT provides a template for constructing tailored AI capable of distilling disease complexity. Such co-developed cognitive tools can hopefully lighten the burden for both patients and providers managing intricate conditions.

Article source: Reference Paper | The database and models are released on GitHub for the research community.

Important Note: arXiv releases preprints that have not yet undergone peer review. As a result, it is important to note that these papers should not be considered conclusive evidence, nor should they be used to direct clinical practice or influence health-related behavior. It is also important to understand that the information presented in these papers is not yet considered established or confirmed.

Learn More:

Website | + posts

Dr. Tamanna Anwar is a Scientist and Co-founder of the Centre of Bioinformatics Research and Technology (CBIRT). She is a passionate bioinformatics scientist and a visionary entrepreneur. Dr. Tamanna has worked as a Young Scientist at Jawaharlal Nehru University, New Delhi. She has also worked as a Postdoctoral Fellow at the University of Saskatchewan, Canada. She has several scientific research publications in high-impact research journals. Her latest endeavor is the development of a platform that acts as a one-stop solution for all bioinformatics related information as well as developing a bioinformatics news portal to report cutting-edge bioinformatics breakthroughs.


Please enter your comment!
Please enter your name here