
KAIST and UCSD researchers proposed AI-driven strategies for microbial gene function discovery, addressing the long-standing bottleneck where many microbial genes remain uncharacterized despite advances in genome sequencing. It systematically analyzes computational biology approaches, from traditional sequencing to cutting-edge deep learning methods.
The Ongoing Challenge of Understanding Microbial Gene Functions
Since the early 2000s, whole genome sequencing has revealed vast genetic blueprints as we gained the ability to map entire genomes quickly and cheaply. Still, many microbial genes are a mystery with unknown functions. Even today, 30-50% of microbial genes have no known functions at all.
Traditional methods like gene deletion and in vitro assays are slow, costly, and often miss the complexities of real biological systems. This creates a bottleneck problem: we know the genetic sequences, but not what they do. Also, the scale of the problem we are facing, with thousands of genes per organism for millions of microbial species, is an overwhelming experimental burden.
While computational biology alone can generate predictions, without experimental validation, these predictions remain uncertain. Contrariwise, running experiments without computational guidance only makes things worse, especially given the enormous number of genes involved.
The research team argues that AI needs to be integrated with experimental biology. By combining the two, researchers can create a feedback loop where AI models predict gene functions, experiments test them, and results are then used to improve the models.
How AI is Cracking the Code
Instead of decades of trial and error, researchers imagine a future where microbial gene functions can be mapped in a matter of months. The kind of speed that can open doors to faster antibiotic discovery, new therapeutics, and sustainable ways to manufacture biological products.
Earlier, computational biology often relied on comparing unknown genes to known ones to understand their functions. Tools like AlphaFold and RoseTTAFold changed the game by predicting protein structures with high accuracy. This allowed researchers to identify binding sites, understand enzyme mechanisms, and see how proteins interact with each other. In short, these tools move from guessing functions to examining how they actually work at a molecular level.
AlphaFold and RoseTTAFold focus on predicting existing structures, but generative AI can take things a step further and design entirely new proteins with specific functions, a shift from annotation to creation.
The paper highlights that breakthroughs can come from combining multiple computational strategies, like gene sequencing for the identification of motifs and domains, and structure prediction.
Conventional methods are also biased, as researchers often test genes in a linear or random fashion. In the review, researchers argue that an ‘Active Learning Framework’ should be applied, which will flip the process. Instead of researchers deciding which experiments to run, AI will guide the choice.
How, you may ask?
The AI model:
- Analyzes predictions across genes
- Flags areas of high uncertainty
- Suggests targeted experiments to solve those uncertainties
- Incorporates experimental results back into the model to improve the accuracy.
By incorporating an AI learning framework. Researchers can prioritize the most critical gene functions first, rather than wasting effort on well-understood or low-impact genes. The loop also speeds up the discovery by continuously refining predictions.
Why AI Can’t Stand Alone?
AI can be a game-changer when it comes to detecting small sequence functions, often overlooked by humans, structure prediction, proposing a new hypothesis about enzyme activities, and many more.
But the team stresses that AI-guided discovery cannot succeed in isolation; it requires tight integration with automated experimental platforms and shared infrastructures like biofoundries (large-scale facilities that automate genetic engineering and testing).
Current deep learning models make accurate predictions, but they often operate as ‘black boxes’, producing results without reasoning. Dr. Gi Bae Kin of KAIST points out that the next frontier is Explainable AI, which can provide justifications as to why a prediction is correct or incorrect.
The core message of the study, as highlighted by Prof Sang Yup, too, is that AI alone is not enough. Human researchers remain central, but automation and AI can make the scale of the problem manageable.
Conclusions and Future Directions
The reviewers argue that the future of microbial gene annotation lies in a human-led ecosystem where AI is integrated, rather than entirely automating the procedure. The integration of AI automation in gene function prediction can be further improved with:
- Focus on transcriptions and enzymes as critical targets.
- Use of AI to provide biological reasoning behind predictions.
- Expanding metagenomic datasets and sharing of failed data to strengthen collective learning.
- Building a research ecosystem where prediction and validation create a feedback loop.
Article Source: Reference Article | Reference Abstract
Disclaimer:
The research discussed in this article was conducted and published by the authors of the referenced paper. CBIRT has no involvement in the research itself. This article is intended solely to raise awareness about recent developments and does not claim authorship or endorsement of the research.
Follow Us!
Learn More:
Saniya is a graduating Chemistry student at Amity University Mumbai with a strong interest in computational chemistry, cheminformatics, and AI/ML applications in healthcare. She aspires to pursue a career as a researcher, computational chemist, or AI/ML engineer. Through her writing, she aims to make complex scientific concepts accessible to a broad audience and support informed decision-making in healthcare.












