BIOINFORMATICS DATABASES & WEB-SERVERS

Best Biological Databases and Web Servers

The list includes a number of useful databases and web-servers used in bioinformatics and biology research.

Sequence Databases

Nucleotide Sequence Databases

  • Nucleotide@NCBI – Database of sequences from several sources, including GenBank, RefSeq, TPA and PDB.
  • ENA@EBI – European Nucleotide Archive comprehensive record of the world’s nucleotide sequencing information.
  • DDBJ – The nucleotide sequence database of Japan.

Protein Sequence Databases

  • PIR – Protein Information Resource is an integrated public bioinformatics resource to support genomic, proteomic and systems biology research.
  • Protein@NCBI – Database of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB. 
  • UniProt – Database of protein sequence and functional information.

Gene Databases

Gene Prediction Servers

  • Genscan – Identification of complete gene structures in genomic DNA.
  • GeneMark – Gene Prediction in Bacteria, Archaea, Metagenomes and Metatranscriptomes.
  • GENEID For predicting genes, exons, splice sites and other signals along a DNA sequence. 
  • AUGUSTUS – For predicting genes in eukaryotic genomic sequences.
  • EuGene – Integrative gene finder for eukaryotic and prokaryotic genomes.

Genome Databases and Browsers

  • ENSEMBL – Genome browser for vertebrate genomes.
  • UCSC Genome Browser – Integrates reference sequence and working draft assemblies for a large collection of genomes at the University of California at Santa Cruz.
  • Phytozome – Portal for plant comparative genomics  .
  • Gramene – Resource for comparative functional genomics in crops and model plant species.
  • NCBI Genome Data Viewer – A genome browser for exploration and analysis of eukaryotic RefSeq genome assemblies.
  • NCBI Genome – Organizes information on genomes including sequences, maps, chromosomes, assemblies, and annotations.
  • VISTA – A comprehensive suite of programs and databases for comparative analysis of genomic sequences.
  • GOLD – Genomes Online Database, is a World Wide Web resource for comprehensive access to information regarding genome and metagenome sequencing projects, and their associated metadata.
  • MITOMAP – A human mitochondrial genome database.

Genome Analysis

  • GeneCensus – Genome comparisons in terms of metabolic pathway activity and protein family sharing.
  • GWAS Catalog – The NHGRI-EBI Catalog of human genome-wide association studies.
  • UCSC Xena – An online exploration tool for public and private, multi-omic and clinical/phenotype data.

Must Read

Deciphering Ligand-Receptor Signaling Dynamics at Cellular Resolution with CytoSignal: A Breakthrough in Spatial Transcriptomics

Deciphering Ligand-Receptor Signaling Dynamics at Cellular Resolution with CytoSignal: A Breakthrough in Spatial Transcriptomics

0
A thorough understanding of the cellular connections inside these tissues is being made possible by technologies that use spatial transcriptomic approaches to revolutionize the...
UnveilingProtLLM: The Next Generation Cross-Modal Large Language Model for Protein-Centric and Language Tasks

Unveiling PROTLLM: The Next Generation Cross-Modal Large Language Model for Protein-Centric and Language Tasks

0
PROTLLM is a versatile crossmodal large language model (LLM) proposed by Beijing Institute of Technology researchers designed for protein-centric and protein-language tasks. It can handle...
Thousands of New Bile Acids Identified: A Major Discovery in Gut Microbiome Research

Thousands of New Bile Acids Identified: A Major Discovery in Gut Microbiome Research

0
Bile acids, simple molecules that live in our intestines, have long been relegated to the role of digestive workhorses, faithfully emulsifying fats for absorption....
Basecamp Research Introduces BaseFold: A New Deep Learning Approach to Protein Structure Prediction

Basecamp Research Introduces BaseFold: A New Deep Learning Approach to Protein Structure Prediction

0
Scaling laws predict the presence of over a trillion species on our planet. However, only a small fraction of them have been studied. Deep...
Unlocking the Microbial Universe: AllTheBacteria - Unparalleled Resource for Bacterial Genomics

Unlocking the Microbial Universe: AllTheBacteria – Unparalleled Resource for Bacterial Genomics

0
DNA archives are a public repository for a great amount of knowledge about the evolution of bacteria and their mobile elements. However, the majority...

Gene Expression and Regulation Databases

Gene Expression Databases

Gene Regulation Databases

  • miRBase – The microRNA database is a searchable database of published miRNA sequences and annotation.
  • TRANSFAC – Provides data on eukaryotic transcription factors, their experimentally-proven binding sites, consensus binding sequences (positional weight matrices) and regulated genes.
  • DBTSS – Database of Transcriptional Start Sites.
  • ENCODE – A public research consortium aimed at identifying all functional elements in the human and mouse genomes.

Protein Structure Databases

Protein 3D Structure Databases

  • PDB – Protein Data Bank archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies.
  • Structure@NCBI – Protein 3D structure repository at NCBI.
  • PDBe@EBI – The EBI macromolecular structure database. 
  • PDBSum@EBI – The PDB summary database at EBI.
  • MMDB@NCBI – The macromolecular database maintained at NCBI.
  • BMRB – The biological magnetic resonance data bank.
  • SCOP – Structural Classification of Proteins aims to provide a comprehensive description of the structural and evolutionary relationships between all known proteins structures.
  • CATH – The database of Calcification, Architecture, Topology and Homologous superfamily.

Databases of protein domain, function, expression and family

Protein Domain Databases

  • InterPro – A resource that provides functional analysis of protein sequences.
  • CDD – A database of conserved protein domains.
  • ProDom – A database of comprehensive set of protein domain families automatically generated from the UniProt knowledge database.
  • SMART – Simple Modular Architecture Research Tool. It allows the identification and annotation of genetically mobile domains and the analysis of domain architectures.
  • HPA – The human protein atlas shows expression and localization of proteins in a large variety of normal human tissues, cancer cells and cell lines with the aid of immunohistochemistry.

Protein Family Databases

  • PFam – A large collection of protein families.
  • PROSITE – A database of protein families and domains.
  • RFam – Database of RNA families, represented by multiple sequence alignments,consensus secondary structures and covariance models.
  • DFam – Database of Transposable Element DNA sequence alignments, hidden Markov Models (HMMs), consensus sequences, and genome annotations.
  • TreeFam – Database composed of phylogenetic trees inferred from animal genomes.

Interaction and Pathway databases

Protein Interaction Databases

  • STRING@EMBL – A web server for protein-protein interaction.
  • BioGRID – Database of Protein, Genetic and Chemical Interactions
  • STITCH@EMBL – A web server for chemical-protein interaction.
  • REACTOME – An open-source, open access, manually curated and peer-reviewed pathway database.
  • DAVID – Database for Annotation, Visualization and Integrated Discovery

Pathway Databases

  • KEGG – A collection of manually drawn pathway maps.
  • PathGuide – A meta-database that provides an overview of more than 190 web-accessible biological pathway and network databases.
  • Pathway Commons – A collection of publicly available pathway information from multiple organisms.
  • PhosphoSitePlus – A comprehensive information and tools for the study of protein post-translational modifications.
  • METscout – Database brings together metabolism and gene expression landscapes.

Metabolite Databases

Metabolite Databases

Specialized Databases

Bacterial Genome Databases

  • PATRIC – The Pathosystems Resource Integration Center provides integrated data and analysis tools to support biomedical research on bacterial infectious diseases.
  • BacDive – The Bacterial Diversity Metadatabase is the world’s largest database for standardized bacterial information.

Virus Genome Databases

  • Viral Genomes – Viral genome information resource at NCBI.
  • GISAID – Global Initiative on Sharing Avian Influenza Data.
  • NCBI Flu – Influenza Virus Resource with influenza genomic data and analysis tools.
  • Plant Viruses – This site provides a central source of information about viruses, viroids and satellites of plants, fungi and protozoa.

Microbial Databases

ECMDB – E. coli Metabolome Database of small molecular metabolites found or produced by Escherichia coli (strain K12, MG1655). 

IMG – Integrated Microbial Genomes system serves as a resource for analysis and annotation of genome and metagenome datasets in a comprehensive comparative context. 

LoQAtE – The localization and quantititation atlas of the yeast proteome.

Plant Databases

  • PlantTFDB – The database of plant transcription factors.
  • TAIR – The Arabidopsis Information Resource (TAIR) is a database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana.
  • AraPort – Araport is a web-server for Arabidopsis thaliana genomics.
  • IC4R – A curated database providing rice genome sequences, updating rice gene annotations and integrating multiple omics data through community-contributed modules. 
  • Oryzabase – A comprehensive rice science database.
  • MaizeGDB – Maize Genetics and Genomics Database
  • SoyBase – Integrating Genetics and Molecular Biology for Soybean Researchers.
  • SGN – Solanaceae Genomics Network is a data resource of the Solanaceae species including tomoto, potato, peppper, eggplant, petunia, nicotiana.
  • CuGenDB – The web resource for the International Cucurbit Genomics Initiative including melon, cucumber, watermelen, pumpkin, etc. 
  • GDR – Genome Database for Rosaceae which provides data mining tools and publicly available genomics, genetics and breeding data for Rosaceae.
  • GoMapMan – Resource for gene functional annotations in the plant sciences.
  • NPACT – A curated database of plant derived natural compounds that exhibit anti-cancerous activity.
  • PGDD – A database used to identify and catalog plant genes in terms of intragenome or cross-genome syntenic relationships.
  • PIECE – A plant gene structure comparison and evolution database of 25 species
  • PlantRNA – Database for tRNA sequences of plants and algae.
  • PlnTFDB– Plant Transcription Factor Database provides putatively complete sets of transcription factors (TFs) and other transcriptional regulators  in  completely sequenced plants.
  • PMRD – Plant microRNA Database integrates publically available plant miRNA data.
  •  SALAD – Motif-based database of protein annotations for plant comparative genomics.

Model Organism Databases

  • MGI – International database resource for the laboratory mouse.
  • RGD – Rat Genome Database. Integrates genetic, genomic, phenotype, and disease-related data generated from rat research.
  • XenBase – Integrates all the diverse biological, genomic, genotype and phenotype data available from Xenopus research.
  • Zfin – ZFIN serves as the zebrafish model organism database.
  • FlyBase – Primary repository of Drosophila Genes & Genomes
  • OnTheFly – A database of Drosophila melanogaster transcription factor DNA binding specificities.
  • FlyAtlas – The Drosophila gene expression atlas.
  • WormBase – Integrates information concerning the genetics, genomics and biology of C. elegans and related nematodes
  • SGD – The Saccharomyces Genome Database
  • BDGP – Berkeley Drosophila Genome Project
  • BeeBase – Comprehensive sequence data source for the bee research community.
  • PomBase – A comprehensive database of Schizosaccharomyces pombe.
  • AtMAD – Arabidopsis thaliana Multi-omics Association Database.
  • ZInc – Database on zebrafish mutations.
  • OikoBase – A curated genome expression database of Oikopleura dioica.

Invertebrate Vectors of Human Pathogens Database

  • VectorBase – Database of Invertebrate Vectors of Human Pathogens. Includes reference and variant genome sequence, structural and functional annotations, and phenotypic and population data for traits such as insecticide resistance.

Disease-Specific Databases

  • AudGenDB – The Audiological and Genetic Database
  • EDKB – Endocrine Disruptor Knowledge Base
  • HGMD – The Human Gene Mutation Database
  • NIAID – National Institute of Allergy and Infectious Diseases
  • OMIM – Online Mendelian Inheritance in Man. An Online Catalog of Human Genes and Genetic Disorders
  • PC-GDB – The pancreatic cancer gene database. Latest information on genes causing pancreatic cancer.
  • Pancreatic Cancer Database – Resource of experimentally demonstrated molecular alterations associated with pancreatic cancer in cancer tissues or cancer cell lines.