Sunday, December 10, 2023


Best Biological Databases and Web Servers

The list includes a number of useful databases and web-servers used in bioinformatics and biology research.

Sequence Databases

Nucleotide Sequence Databases

  • Nucleotide@NCBI – Database of sequences from several sources, including GenBank, RefSeq, TPA and PDB.
  • ENA@EBI – European Nucleotide Archive comprehensive record of the world’s nucleotide sequencing information.
  • DDBJ – The nucleotide sequence database of Japan.

Protein Sequence Databases

  • PIR – Protein Information Resource is an integrated public bioinformatics resource to support genomic, proteomic and systems biology research.
  • Protein@NCBI – Database of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB. 
  • UniProt – Database of protein sequence and functional information.

Gene Databases

Gene Prediction Servers

  • Genscan – Identification of complete gene structures in genomic DNA.
  • GeneMark – Gene Prediction in Bacteria, Archaea, Metagenomes and Metatranscriptomes.
  • GENEID For predicting genes, exons, splice sites and other signals along a DNA sequence. 
  • AUGUSTUS – For predicting genes in eukaryotic genomic sequences.
  • EuGene – Integrative gene finder for eukaryotic and prokaryotic genomes.

Genome Databases and Browsers

  • ENSEMBL – Genome browser for vertebrate genomes.
  • UCSC Genome Browser – Integrates reference sequence and working draft assemblies for a large collection of genomes at the University of California at Santa Cruz.
  • Phytozome – Portal for plant comparative genomics  .
  • Gramene – Resource for comparative functional genomics in crops and model plant species.
  • NCBI Genome Data Viewer – A genome browser for exploration and analysis of eukaryotic RefSeq genome assemblies.
  • NCBI Genome – Organizes information on genomes including sequences, maps, chromosomes, assemblies, and annotations.
  • VISTA – A comprehensive suite of programs and databases for comparative analysis of genomic sequences.
  • GOLD – Genomes Online Database, is a World Wide Web resource for comprehensive access to information regarding genome and metagenome sequencing projects, and their associated metadata.
  • MITOMAP – A human mitochondrial genome database.

Genome Analysis

  • GeneCensus – Genome comparisons in terms of metabolic pathway activity and protein family sharing.
  • GWAS Catalog – The NHGRI-EBI Catalog of human genome-wide association studies.
  • UCSC Xena – An online exploration tool for public and private, multi-omic and clinical/phenotype data.

Must Read

Analyzing Bio-Ontologies Made Easy: A Deep Dive into simona R Package for Semantic Similarity

Analyzing Bio-Ontologies Made Easy: A Deep Dive into simona R Package for Semantic Similarity

As modern biology embraces high-throughput genomics and multi-omics profiling, the torrents of data now routinely generated necessitate improved systems organizing information to enable meaningful...
Unleashing the Power of Recurrent Neural Networks for Orally Bioavailable Drug Design with Novomol

Unleashing the Power of Recurrent Neural Networks for Orally Bioavailable Drug Design with NovoMol

NovoMol is a novel de novo technique that increases the efficiency of clinical trial times by mass-generating therapeutic molecules with excellent oral bioavailability through...
Meet Multiple Protein Profiler: A New Tool for Analyzing Proteomic Datasets

Meet Multiple Protein Profiler: A New Tool for Analyzing Proteomic Datasets

Dalhousie University, Canada researchers have developed a new web-based tool called Multiple Protein Profiler (MPP) that can efficiently calculate 12 key physicochemical properties of...
French Startup Selling $1,000 DNA Storage Cards Capable of Storing Messages in Genetic Code

French Startup Selling $1,000 DNA Cards Capable of Storing Messages in Genetic Code

Biomemory, a French digital storage startup, recently unveiled encapsulated DNA storage devices the size of credit cards, holding personalized messages as brief proofs of...
Universal Cell Embeddings - A Revolutionary Foundation Model for Cell Biology

Unlocking the Secrets of Life: Universal Cell Embeddings – A Revolutionary Foundation Model for...

Universal Cell Embedding (UCE) is a foundation model proposed by a group of scientists from Stanford University. Using a corpus of cell atlas data...

Gene Expression and Regulation Databases

Gene Expression Databases

Gene Regulation Databases

  • miRBase – The microRNA database is a searchable database of published miRNA sequences and annotation.
  • TRANSFAC – Provides data on eukaryotic transcription factors, their experimentally-proven binding sites, consensus binding sequences (positional weight matrices) and regulated genes.
  • DBTSS – Database of Transcriptional Start Sites.
  • ENCODE – A public research consortium aimed at identifying all functional elements in the human and mouse genomes.

Protein Structure Databases

Protein 3D Structure Databases

  • PDB – Protein Data Bank archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies.
  • Structure@NCBI – Protein 3D structure repository at NCBI.
  • PDBe@EBI – The EBI macromolecular structure database. 
  • PDBSum@EBI – The PDB summary database at EBI.
  • MMDB@NCBI – The macromolecular database maintained at NCBI.
  • BMRB – The biological magnetic resonance data bank.
  • SCOP – Structural Classification of Proteins aims to provide a comprehensive description of the structural and evolutionary relationships between all known proteins structures.
  • CATH – The database of Calcification, Architecture, Topology and Homologous superfamily.

Databases of protein domain, function, expression and family

Protein Domain Databases

  • InterPro – A resource that provides functional analysis of protein sequences.
  • CDD – A database of conserved protein domains.
  • ProDom – A database of comprehensive set of protein domain families automatically generated from the UniProt knowledge database.
  • SMART – Simple Modular Architecture Research Tool. It allows the identification and annotation of genetically mobile domains and the analysis of domain architectures.
  • HPA – The human protein atlas shows expression and localization of proteins in a large variety of normal human tissues, cancer cells and cell lines with the aid of immunohistochemistry.

Protein Family Databases

  • PFam – A large collection of protein families.
  • PROSITE – A database of protein families and domains.
  • RFam – Database of RNA families, represented by multiple sequence alignments,consensus secondary structures and covariance models.
  • DFam – Database of Transposable Element DNA sequence alignments, hidden Markov Models (HMMs), consensus sequences, and genome annotations.
  • TreeFam – Database composed of phylogenetic trees inferred from animal genomes.

Interaction and Pathway databases

Protein Interaction Databases

  • STRING@EMBL – A web server for protein-protein interaction.
  • BioGRID – Database of Protein, Genetic and Chemical Interactions
  • STITCH@EMBL – A web server for chemical-protein interaction.
  • REACTOME – An open-source, open access, manually curated and peer-reviewed pathway database.
  • DAVID – Database for Annotation, Visualization and Integrated Discovery

Pathway Databases

  • KEGG – A collection of manually drawn pathway maps.
  • PathGuide – A meta-database that provides an overview of more than 190 web-accessible biological pathway and network databases.
  • Pathway Commons – A collection of publicly available pathway information from multiple organisms.
  • PhosphoSitePlus – A comprehensive information and tools for the study of protein post-translational modifications.
  • METscout – Database brings together metabolism and gene expression landscapes.

Metabolite Databases

Metabolite Databases

Specialized Databases

Bacterial Genome Databases

  • PATRIC – The Pathosystems Resource Integration Center provides integrated data and analysis tools to support biomedical research on bacterial infectious diseases.
  • BacDive – The Bacterial Diversity Metadatabase is the world’s largest database for standardized bacterial information.

Virus Genome Databases

  • Viral Genomes – Viral genome information resource at NCBI.
  • GISAID – Global Initiative on Sharing Avian Influenza Data.
  • NCBI Flu – Influenza Virus Resource with influenza genomic data and analysis tools.
  • Plant Viruses – This site provides a central source of information about viruses, viroids and satellites of plants, fungi and protozoa.

Microbial Databases

ECMDB – E. coli Metabolome Database of small molecular metabolites found or produced by Escherichia coli (strain K12, MG1655). 

IMG – Integrated Microbial Genomes system serves as a resource for analysis and annotation of genome and metagenome datasets in a comprehensive comparative context. 

LoQAtE – The localization and quantititation atlas of the yeast proteome.

Plant Databases

  • PlantTFDB – The database of plant transcription factors.
  • TAIR – The Arabidopsis Information Resource (TAIR) is a database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana.
  • AraPort – Araport is a web-server for Arabidopsis thaliana genomics.
  • IC4R – A curated database providing rice genome sequences, updating rice gene annotations and integrating multiple omics data through community-contributed modules. 
  • Oryzabase – A comprehensive rice science database.
  • MaizeGDB – Maize Genetics and Genomics Database
  • SoyBase – Integrating Genetics and Molecular Biology for Soybean Researchers.
  • SGN – Solanaceae Genomics Network is a data resource of the Solanaceae species including tomoto, potato, peppper, eggplant, petunia, nicotiana.
  • CuGenDB – The web resource for the International Cucurbit Genomics Initiative including melon, cucumber, watermelen, pumpkin, etc. 
  • GDR – Genome Database for Rosaceae which provides data mining tools and publicly available genomics, genetics and breeding data for Rosaceae.
  • GoMapMan – Resource for gene functional annotations in the plant sciences.
  • NPACT – A curated database of plant derived natural compounds that exhibit anti-cancerous activity.
  • PGDD – A database used to identify and catalog plant genes in terms of intragenome or cross-genome syntenic relationships.
  • PIECE – A plant gene structure comparison and evolution database of 25 species
  • PlantRNA – Database for tRNA sequences of plants and algae.
  • PlnTFDB– Plant Transcription Factor Database provides putatively complete sets of transcription factors (TFs) and other transcriptional regulators  in  completely sequenced plants.
  • PMRD – Plant microRNA Database integrates publically available plant miRNA data.
  •  SALAD – Motif-based database of protein annotations for plant comparative genomics.

Model Organism Databases

  • MGI – International database resource for the laboratory mouse.
  • RGD – Rat Genome Database. Integrates genetic, genomic, phenotype, and disease-related data generated from rat research.
  • XenBase – Integrates all the diverse biological, genomic, genotype and phenotype data available from Xenopus research.
  • Zfin – ZFIN serves as the zebrafish model organism database.
  • FlyBase – Primary repository of Drosophila Genes & Genomes
  • OnTheFly – A database of Drosophila melanogaster transcription factor DNA binding specificities.
  • FlyAtlas – The Drosophila gene expression atlas.
  • WormBase – Integrates information concerning the genetics, genomics and biology of C. elegans and related nematodes
  • SGD – The Saccharomyces Genome Database
  • BDGP – Berkeley Drosophila Genome Project
  • BeeBase – Comprehensive sequence data source for the bee research community.
  • PomBase – A comprehensive database of Schizosaccharomyces pombe.
  • AtMAD – Arabidopsis thaliana Multi-omics Association Database.
  • ZInc – Database on zebrafish mutations.
  • OikoBase – A curated genome expression database of Oikopleura dioica.

Invertebrate Vectors of Human Pathogens Database

  • VectorBase – Database of Invertebrate Vectors of Human Pathogens. Includes reference and variant genome sequence, structural and functional annotations, and phenotypic and population data for traits such as insecticide resistance.

Disease-Specific Databases

  • AudGenDB – The Audiological and Genetic Database
  • EDKB – Endocrine Disruptor Knowledge Base
  • HGMD – The Human Gene Mutation Database
  • NIAID – National Institute of Allergy and Infectious Diseases
  • OMIM – Online Mendelian Inheritance in Man. An Online Catalog of Human Genes and Genetic Disorders
  • PC-GDB – The pancreatic cancer gene database. Latest information on genes causing pancreatic cancer.
  • Pancreatic Cancer Database – Resource of experimentally demonstrated molecular alterations associated with pancreatic cancer in cancer tissues or cancer cell lines.