Friday, September 30, 2022


Best Biological Databases and Web Servers

The list includes a number of useful databases and web-servers used in bioinformatics and biology research.

Sequence Databases

Nucleotide Sequence Databases

  • Nucleotide@NCBI – Database of sequences from several sources, including GenBank, RefSeq, TPA and PDB.
  • ENA@EBI – European Nucleotide Archive comprehensive record of the world’s nucleotide sequencing information.
  • DDBJ – The nucleotide sequence database of Japan.

Protein Sequence Databases

  • PIR – Protein Information Resource is an integrated public bioinformatics resource to support genomic, proteomic and systems biology research.
  • Protein@NCBI – Database of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB. 
  • UniProt – Database of protein sequence and functional information.

Gene Databases

Gene Prediction Servers

  • Genscan – Identification of complete gene structures in genomic DNA.
  • GeneMark – Gene Prediction in Bacteria, Archaea, Metagenomes and Metatranscriptomes.
  • GENEID For predicting genes, exons, splice sites and other signals along a DNA sequence. 
  • AUGUSTUS – For predicting genes in eukaryotic genomic sequences.
  • EuGene – Integrative gene finder for eukaryotic and prokaryotic genomes.

Genome Databases and Browsers

  • ENSEMBL – Genome browser for vertebrate genomes.
  • UCSC Genome Browser – Integrates reference sequence and working draft assemblies for a large collection of genomes at the University of California at Santa Cruz.
  • Phytozome – Portal for plant comparative genomics  .
  • Gramene – Resource for comparative functional genomics in crops and model plant species.
  • NCBI Genome Data Viewer – A genome browser for exploration and analysis of eukaryotic RefSeq genome assemblies.
  • NCBI Genome – Organizes information on genomes including sequences, maps, chromosomes, assemblies, and annotations.
  • VISTA – A comprehensive suite of programs and databases for comparative analysis of genomic sequences.
  • GOLD – Genomes Online Database, is a World Wide Web resource for comprehensive access to information regarding genome and metagenome sequencing projects, and their associated metadata.
  • MITOMAP – A human mitochondrial genome database.

Genome Analysis

  • GeneCensus – Genome comparisons in terms of metabolic pathway activity and protein family sharing.
  • GWAS Catalog – The NHGRI-EBI Catalog of human genome-wide association studies.
  • UCSC Xena – An online exploration tool for public and private, multi-omic and clinical/phenotype data.

Must Read

A machine learning tool - SigProfilerExtractor has identified a link between bladder cancer and tobacco smoking.

New AI Tool ‘SigProfilerExtractor’ Identifies Mutational Signature Linked To Tobacco Smoking and Bladder Cancer

A new powerful machine learning tool, 'SigProfilerExtractor' has identified a link between bladder cancer and tobacco smoking. The study was led by researchers at the...

Eliminating the Confounder Bias – A Radical Approach for Better Identification of Cancer Drug...

Researchers from Jilin University have constructed a machine learning-based model that reduces confounders' effects to identify potential driver genes involved in cancer initiation and...
INAP an integrated network analysis pipeline

iNAP: An Integrated Network Analysis Pipeline for Microbiome Studies in Complex Ecosystems

Microbial network analysis is an acceptable approach to investigating microbiome and metagenomic datasets and finding insights into a complex ecosystem. Ye Deng et al....
Occurrence of antibiotic resistomes in humans soil livestock and carcasses

Scientists Investigated Microbial Communities and Resistomes in Relation to Interconnected Humans, Soil, and Livestock

Antimicrobial resistance (AMR) may be reservoired in intensive livestock farms, posing a threat to surrounding communities. The gut microbiome of livestock, workers, and their...
Pore-C experimental and data workflow to identify chromatin network

Mathematical Modeling Bridges Chromatin Architecture with Potential in Genome Medicine

Scientists from the University of Michigan employed hypergraph theory, utilizing long sequence reads to map genome-wide multi-ways to identify chromatin architecture within the human...

Gene Expression and Regulation Databases

Gene Expression Databases

Gene Regulation Databases

  • miRBase – The microRNA database is a searchable database of published miRNA sequences and annotation.
  • TRANSFAC – Provides data on eukaryotic transcription factors, their experimentally-proven binding sites, consensus binding sequences (positional weight matrices) and regulated genes.
  • DBTSS – Database of Transcriptional Start Sites.
  • ENCODE – A public research consortium aimed at identifying all functional elements in the human and mouse genomes.

Protein Structure Databases

Protein 3D Structure Databases

  • PDB – Protein Data Bank archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies.
  • Structure@NCBI – Protein 3D structure repository at NCBI.
  • PDBe@EBI – The EBI macromolecular structure database. 
  • PDBSum@EBI – The PDB summary database at EBI.
  • MMDB@NCBI – The macromolecular database maintained at NCBI.
  • BMRB – The biological magnetic resonance data bank.
  • SCOP – Structural Classification of Proteins aims to provide a comprehensive description of the structural and evolutionary relationships between all known proteins structures.
  • CATH – The database of Calcification, Architecture, Topology and Homologous superfamily.

Databases of protein domain, function, expression and family

Protein Domain Databases

  • InterPro – A resource that provides functional analysis of protein sequences.
  • CDD – A database of conserved protein domains.
  • ProDom – A database of comprehensive set of protein domain families automatically generated from the UniProt knowledge database.
  • SMART – Simple Modular Architecture Research Tool. It allows the identification and annotation of genetically mobile domains and the analysis of domain architectures.
  • HPA – The human protein atlas shows expression and localization of proteins in a large variety of normal human tissues, cancer cells and cell lines with the aid of immunohistochemistry.

Protein Family Databases

  • PFam – A large collection of protein families.
  • PROSITE – A database of protein families and domains.
  • RFam – Database of RNA families, represented by multiple sequence alignments,consensus secondary structures and covariance models.
  • DFam – Database of Transposable Element DNA sequence alignments, hidden Markov Models (HMMs), consensus sequences, and genome annotations.
  • TreeFam – Database composed of phylogenetic trees inferred from animal genomes.

Interaction and Pathway databases

Protein Interaction Databases

  • STRING@EMBL – A web server for protein-protein interaction.
  • BioGRID – Database of Protein, Genetic and Chemical Interactions
  • STITCH@EMBL – A web server for chemical-protein interaction.
  • REACTOME – An open-source, open access, manually curated and peer-reviewed pathway database.
  • DAVID – Database for Annotation, Visualization and Integrated Discovery

Pathway Databases

  • KEGG – A collection of manually drawn pathway maps.
  • PathGuide – A meta-database that provides an overview of more than 190 web-accessible biological pathway and network databases.
  • Pathway Commons – A collection of publicly available pathway information from multiple organisms.
  • PhosphoSitePlus – A comprehensive information and tools for the study of protein post-translational modifications.
  • METscout – Database brings together metabolism and gene expression landscapes.

Metabolite Databases

Metabolite Databases

Specialized Databases

Bacterial Genome Databases

  • PATRIC – The Pathosystems Resource Integration Center provides integrated data and analysis tools to support biomedical research on bacterial infectious diseases.
  • BacDive – The Bacterial Diversity Metadatabase is the world’s largest database for standardized bacterial information.

Virus Genome Databases

  • Viral Genomes – Viral genome information resource at NCBI.
  • GISAID – Global Initiative on Sharing Avian Influenza Data.
  • NCBI Flu – Influenza Virus Resource with influenza genomic data and analysis tools.
  • Plant Viruses – This site provides a central source of information about viruses, viroids and satellites of plants, fungi and protozoa.

Microbial Databases

ECMDB – E. coli Metabolome Database of small molecular metabolites found or produced by Escherichia coli (strain K12, MG1655). 

IMG – Integrated Microbial Genomes system serves as a resource for analysis and annotation of genome and metagenome datasets in a comprehensive comparative context. 

LoQAtE – The localization and quantititation atlas of the yeast proteome.

Plant Databases

  • PlantTFDB – The database of plant transcription factors.
  • TAIR – The Arabidopsis Information Resource (TAIR) is a database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana.
  • AraPort – Araport is a web-server for Arabidopsis thaliana genomics.
  • IC4R – A curated database providing rice genome sequences, updating rice gene annotations and integrating multiple omics data through community-contributed modules. 
  • Oryzabase – A comprehensive rice science database.
  • MaizeGDB – Maize Genetics and Genomics Database
  • SoyBase – Integrating Genetics and Molecular Biology for Soybean Researchers.
  • SGN – Solanaceae Genomics Network is a data resource of the Solanaceae species including tomoto, potato, peppper, eggplant, petunia, nicotiana.
  • CuGenDB – The web resource for the International Cucurbit Genomics Initiative including melon, cucumber, watermelen, pumpkin, etc. 
  • GDR – Genome Database for Rosaceae which provides data mining tools and publicly available genomics, genetics and breeding data for Rosaceae.
  • GoMapMan – Resource for gene functional annotations in the plant sciences.
  • NPACT – A curated database of plant derived natural compounds that exhibit anti-cancerous activity.
  • PGDD – A database used to identify and catalog plant genes in terms of intragenome or cross-genome syntenic relationships.
  • PIECE – A plant gene structure comparison and evolution database of 25 species
  • PlantRNA – Database for tRNA sequences of plants and algae.
  • PlnTFDB– Plant Transcription Factor Database provides putatively complete sets of transcription factors (TFs) and other transcriptional regulators  in  completely sequenced plants.
  • PMRD – Plant microRNA Database integrates publically available plant miRNA data.
  •  SALAD – Motif-based database of protein annotations for plant comparative genomics.

Model Organism Databases

  • MGI – International database resource for the laboratory mouse.
  • RGD – Rat Genome Database. Integrates genetic, genomic, phenotype, and disease-related data generated from rat research.
  • XenBase – Integrates all the diverse biological, genomic, genotype and phenotype data available from Xenopus research.
  • Zfin – ZFIN serves as the zebrafish model organism database.
  • FlyBase – Primary repository of Drosophila Genes & Genomes
  • OnTheFly – A database of Drosophila melanogaster transcription factor DNA binding specificities.
  • FlyAtlas – The Drosophila gene expression atlas.
  • WormBase – Integrates information concerning the genetics, genomics and biology of C. elegans and related nematodes
  • SGD – The Saccharomyces Genome Database
  • BDGP – Berkeley Drosophila Genome Project
  • BeeBase – Comprehensive sequence data source for the bee research community.
  • PomBase – A comprehensive database of Schizosaccharomyces pombe.
  • AtMAD – Arabidopsis thaliana Multi-omics Association Database.
  • ZInc – Database on zebrafish mutations. 
  • OikoBase – A curated genome expression database of Oikopleura dioica.

Invertebrate Vectors of Human Pathogens Database

  • VectorBase – Database of Invertebrate Vectors of Human Pathogens. Includes reference and variant genome sequence, structural and functional annotations, and phenotypic and population data for traits such as insecticide resistance.

Disease-Specific Databases

  • AudGenDB – The Audiological and Genetic Database
  • EDKB – Endocrine Disruptor Knowledge Base
  • HGMD – The Human Gene Mutation Database
  • NIAID – National Institute of Allergy and Infectious Diseases
  • OMIM – Online Mendelian Inheritance in Man. An Online Catalog of Human Genes and Genetic Disorders
  • PC-GDB – The pancreatic cancer gene database. Latest information on genes causing pancreatic cancer.
  • Pancreatic Cancer Database – Resource of experimentally demonstrated molecular alterations associated with pancreatic cancer in cancer tissues or cancer cell lines.