Unstudied areas of life sciences, including uncharacterized or less characterized genes and proteins, have now got an exclusively dedicated repository through the recent works of scientists affiliated with the MRC Laboratory of Molecular Biology, Cambridge; the University of Cambridge and the University of Oxford, United Kingdom. The open source database ‘Unknome’ is now the repertoire of organism’s “unknome” and encompasses the Genome Ontology (GO) annotations-based “knownness” score of those poorly understood proteins, representing the extent of information that is known about the protein. Apart from that, the study reveals that shifting attention from well-established proteins towards less-characterized proteins is extremely promising for maturing current knowledge about cellular developmental and regulatory processes, rectifying misannotations, and extending the scope for achieving novel druggable targets. 

Deprived Attention Towards Unstudied Genes and Proteins: The Apparent Reasons 

The inquisitiveness and surge in technological advancement have boosted and equipped the science community for meticulous investigation of the realm of proteins, and the venture has enriched the biological sciences with the discovery of different proteins and an understanding of their biological significance, especially functions.

Despite this, the unwrapping of the functions and roles of innumerable proteins is yet to be done. As the author mentions, most of the studies and databases are focused on proteins and genes, which are already well-studied and well-designated. Conceivably, this pattern is attributed to the undersupply of funding from the authority, the associated risk of expense, preconceived speculation of unsuccess, and the inconvenience of assays in laboratory settings. 

Outside wet laboratory protocols, considering the bioinformatics approach to accrediting functions to less-characterized proteins or genes are obtained through strategizing the similarity or homology of the protein of interest with an already well-defined protein or genes from different species. Often, this approach is constrained by insufficient similarity matching that is well enough to attach some putative inference about the unstudied genes or proteins.

The Highlight of the Study: Emphasizing the Evaluation of Less-Studied Proteins 

Clearly, there will always be perplexing amounts of missing links while evolving our perspectives about the mechanisms of a biological phenomenon and evolutionary contexts if a chunk of the biological sequences remain unexplored and are paid less attention to. Perhaps we shall miss out on biologically significant genes and proteins, which might have a profound role in development, proliferation, and homeostasis. 

The inadvertent negligence and the latent reluctance are as disturbing as an overlooked opportunity to delve into gaining important clinical insights and the plausibility of cognizing novel druggable sites for therapeutic interventions. Thus, the initiatives to heal the gaps hold enormous potential for novel and significant breakthroughs. MRC Laboratory researchers have introduced the “Unknome database” and exemplified the scope of leveraging the database to unveil uninvestigated roles of proteins, with an aspiration to encourage upcoming studies bridging those breaches.

Unknome Database: Elucidating Extent of Protein Information

The protein sequence data corresponds to the reference UniProt Proteomes used by the latest PANTHER database and includes 12 species, including humans. The PANTHER provides the protein family information via a group of UniProt IDs that can be combined with selected information from UniProt entries, including protein sequence, GO (Gene Ontology) terms, PubMed citations, species, gene name, and cross-references to species-specific databases.

GO annotation is based on a controlled vocabulary, which makes it consistent between different species, and also, GO terms are well structured and comprehensive, which can be harnessed by the users to apply their definition of knownness. Therefore, the GO Consortium-provided annotations of protein functions are well-suited to this application. Evidence terms associated with the GO terms are weighted, which allows the identification of the most relevant functions, as well as they are summed to generate a knownness score for each protein.

Thus, the Unknome database aggregates relevant information and provides a knownness score for each protein and protein family or cluster, which can be recompiled in a few hours.

“Knownness” can be defined as the previous notions about the activity, atomic resolution, etc., of the protein or gene. The criteria for knownness are set to default, but they can also be user-defined. The final knownness score of a cluster of proteins is set as the highest score of a protein in the cluster.

The interface displays orthologs for each protein family and also shows knownness changes over time. Accordingly, the ultimate purpose of the Unknome database is to shrink rather than expand, in order to represent the remaining unexplored regions. The database allows the selection of a “unknome” of the included organism and provides a foundation for further experimental studies to unravel novel insights.

Unknomics Study: A Route Towards Bountiful Resources 

The designed database immensely helps to retrieve data about unexplored proteins, which can then be experimentally characterized to establish their roles. The researchers conducted RNA interference (RNAi) and knockout experiments with 260 Drosophila proteins of unknown function with a knownness score of ≤ 1.0 that are conserved in humans.

The series of experiments assayed the role of the selected proteins in biological processes like fertility, tissue growth, protein quality control, stress resilience, and locomotion. The experimentation demonstrated the function of the unstudied proteins in cilia movement, Notch pathway signaling, and 62 genes appear to be essential for viability. Thus, the Unknome database can be employed to venture into studying uncharacterized proteins.


The study attempts to spark further research to investigate the significant and unexplored biology encoded in the neglected parts of proteomes. The work exemplifies the essence of poorly understood genes and proteins, provides a resource to accelerate future research, and highlights a need to support database curation to eliminate the intrinsic errors emanating from misannotation. Lastly, the database’s reliability on ortholog entities and automated annotation, which would probably be erroneous without manual curation, are the two major issues that can generate bias if not taken care of. Otherwise, the Unknome database potentially cultivates an opportunity to explore unstudied biological domains.

Story Source: Reference Paper | Unknome: Website

Learn More:

Website | + posts

Aditi is a consulting scientific writing intern at CBIRT, specializing in explaining interdisciplinary and intricate topics. As a student pursuing an Integrated PG in Biotechnology, she is driven by a deep passion for experiencing multidisciplinary research fields. Aditi is particularly fond of the dynamism, potential, and integrative facets of her major. Through her articles, she aspires to decipher and articulate current studies and innovations in the Bioinformatics domain, aiming to captivate the minds and hearts of readers with her insightful perspectives.


Please enter your comment!
Please enter your name here