BIOINFORMATICS DATABASES & WEB-SERVERS

Best Biological Databases and Web Servers

The list includes a number of useful databases and web-servers used in bioinformatics and biology research.

Sequence Databases

Nucleotide Sequence Databases

  • Nucleotide@NCBI – Database of sequences from several sources, including GenBank, RefSeq, TPA and PDB.
  • ENA@EBI – European Nucleotide Archive comprehensive record of the world’s nucleotide sequencing information.
  • DDBJ – The nucleotide sequence database of Japan.

Protein Sequence Databases

  • PIR – Protein Information Resource is an integrated public bioinformatics resource to support genomic, proteomic and systems biology research.
  • Protein@NCBI – Database of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB. 
  • UniProt – Database of protein sequence and functional information.

Gene Databases

Gene Prediction Servers

  • Genscan – Identification of complete gene structures in genomic DNA.
  • GeneMark – Gene Prediction in Bacteria, Archaea, Metagenomes and Metatranscriptomes.
  • GENEID For predicting genes, exons, splice sites and other signals along a DNA sequence. 
  • AUGUSTUS – For predicting genes in eukaryotic genomic sequences.
  • EuGene – Integrative gene finder for eukaryotic and prokaryotic genomes.

Genome Databases and Browsers

  • ENSEMBL – Genome browser for vertebrate genomes.
  • UCSC Genome Browser – Integrates reference sequence and working draft assemblies for a large collection of genomes at the University of California at Santa Cruz.
  • Phytozome – Portal for plant comparative genomics  .
  • Gramene – Resource for comparative functional genomics in crops and model plant species.
  • NCBI Genome Data Viewer – A genome browser for exploration and analysis of eukaryotic RefSeq genome assemblies.
  • NCBI Genome – Organizes information on genomes including sequences, maps, chromosomes, assemblies, and annotations.
  • VISTA – A comprehensive suite of programs and databases for comparative analysis of genomic sequences.
  • GOLD – Genomes Online Database, is a World Wide Web resource for comprehensive access to information regarding genome and metagenome sequencing projects, and their associated metadata.
  • MITOMAP – A human mitochondrial genome database.

Genome Analysis

  • GeneCensus – Genome comparisons in terms of metabolic pathway activity and protein family sharing.
  • GWAS Catalog – The NHGRI-EBI Catalog of human genome-wide association studies.
  • UCSC Xena – An online exploration tool for public and private, multi-omic and clinical/phenotype data.

Must Read

Revolutionizing Molecular Linker Design with DiffLinker: The Power of Equivariant 3D-conditional Diffusion Models

Revolutionizing Molecular Linker Design with DiffLinker: The Power of Equivariant 3D-conditional Diffusion Models

0
The effective paradigm of molecular linkers in drug discovery is crucial for obtaining relevant candidate molecules in early-stage drug development. In this study, researchers...
PPI3D: A Comprehensive Web Server to Explore, Analyze, and Model Protein-Protein, Protein-Peptide, and Protein-Nucleic Acid Interactions

Unleash the Power of PPI3D: A Comprehensive Web Server to Explore, Analyze, and Model...

0
Understanding molecular mechanisms requires an understanding of protein interactions with nucleic acids. Researchers can query preprocessed and clustered structural data, analyze the data, and...
Schematic of the GEMLI lineage prediction pipeline.

Untangling the Cellular Family Tree: GEMLI Estimates Lineage from scRNA-seq Data

0
Researchers from the Institute of Bioengineering, Lausanne, Switzerland, introduce GEMLI, a powerful computational tool that enables robust identification of cell lineages solely from scRNA-seq...
Smart Sampling for Smarter Drugs: How Active Learning Boosts Drug Discovery

Smart Sampling for Smarter Drugs: How Active Learning Boosts Drug Discovery

0
In computational drug discovery, active learning (AL) is a potent approach that makes it possible to identify the best binders from large chemical libraries....
RNA Language Models

UC Berkeley’s Breakthrough RNA Language Models Predict Mutations that Enhance RNA Function

0
A new tool from UC Berkeley can predict mutations that can improve the way how RNA works. By leveraging hyperthermophilic RNAs, the researchers identified...

Gene Expression and Regulation Databases

Gene Expression Databases

Gene Regulation Databases

  • miRBase – The microRNA database is a searchable database of published miRNA sequences and annotation.
  • TRANSFAC – Provides data on eukaryotic transcription factors, their experimentally-proven binding sites, consensus binding sequences (positional weight matrices) and regulated genes.
  • DBTSS – Database of Transcriptional Start Sites.
  • ENCODE – A public research consortium aimed at identifying all functional elements in the human and mouse genomes.

Protein Structure Databases

Protein 3D Structure Databases

  • PDB – Protein Data Bank archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies.
  • Structure@NCBI – Protein 3D structure repository at NCBI.
  • PDBe@EBI – The EBI macromolecular structure database. 
  • PDBSum@EBI – The PDB summary database at EBI.
  • MMDB@NCBI – The macromolecular database maintained at NCBI.
  • BMRB – The biological magnetic resonance data bank.
  • SCOP – Structural Classification of Proteins aims to provide a comprehensive description of the structural and evolutionary relationships between all known proteins structures.
  • CATH – The database of Calcification, Architecture, Topology and Homologous superfamily.

Databases of protein domain, function, expression and family

Protein Domain Databases

  • InterPro – A resource that provides functional analysis of protein sequences.
  • CDD – A database of conserved protein domains.
  • ProDom – A database of comprehensive set of protein domain families automatically generated from the UniProt knowledge database.
  • SMART – Simple Modular Architecture Research Tool. It allows the identification and annotation of genetically mobile domains and the analysis of domain architectures.
  • HPA – The human protein atlas shows expression and localization of proteins in a large variety of normal human tissues, cancer cells and cell lines with the aid of immunohistochemistry.

Protein Family Databases

  • PFam – A large collection of protein families.
  • PROSITE – A database of protein families and domains.
  • RFam – Database of RNA families, represented by multiple sequence alignments,consensus secondary structures and covariance models.
  • DFam – Database of Transposable Element DNA sequence alignments, hidden Markov Models (HMMs), consensus sequences, and genome annotations.
  • TreeFam – Database composed of phylogenetic trees inferred from animal genomes.

Interaction and Pathway databases

Protein Interaction Databases

  • STRING@EMBL – A web server for protein-protein interaction.
  • BioGRID – Database of Protein, Genetic and Chemical Interactions
  • STITCH@EMBL – A web server for chemical-protein interaction.
  • REACTOME – An open-source, open access, manually curated and peer-reviewed pathway database.
  • DAVID – Database for Annotation, Visualization and Integrated Discovery

Pathway Databases

  • KEGG – A collection of manually drawn pathway maps.
  • PathGuide – A meta-database that provides an overview of more than 190 web-accessible biological pathway and network databases.
  • Pathway Commons – A collection of publicly available pathway information from multiple organisms.
  • PhosphoSitePlus – A comprehensive information and tools for the study of protein post-translational modifications.
  • METscout – Database brings together metabolism and gene expression landscapes.

Metabolite Databases

Metabolite Databases

Specialized Databases

Bacterial Genome Databases

  • PATRIC – The Pathosystems Resource Integration Center provides integrated data and analysis tools to support biomedical research on bacterial infectious diseases.
  • BacDive – The Bacterial Diversity Metadatabase is the world’s largest database for standardized bacterial information.

Virus Genome Databases

  • Viral Genomes – Viral genome information resource at NCBI.
  • GISAID – Global Initiative on Sharing Avian Influenza Data.
  • NCBI Flu – Influenza Virus Resource with influenza genomic data and analysis tools.
  • Plant Viruses – This site provides a central source of information about viruses, viroids and satellites of plants, fungi and protozoa.

Microbial Databases

ECMDB – E. coli Metabolome Database of small molecular metabolites found or produced by Escherichia coli (strain K12, MG1655). 

IMG – Integrated Microbial Genomes system serves as a resource for analysis and annotation of genome and metagenome datasets in a comprehensive comparative context. 

LoQAtE – The localization and quantititation atlas of the yeast proteome.

Plant Databases

  • PlantTFDB – The database of plant transcription factors.
  • TAIR – The Arabidopsis Information Resource (TAIR) is a database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana.
  • AraPort – Araport is a web-server for Arabidopsis thaliana genomics.
  • IC4R – A curated database providing rice genome sequences, updating rice gene annotations and integrating multiple omics data through community-contributed modules. 
  • Oryzabase – A comprehensive rice science database.
  • MaizeGDB – Maize Genetics and Genomics Database
  • SoyBase – Integrating Genetics and Molecular Biology for Soybean Researchers.
  • SGN – Solanaceae Genomics Network is a data resource of the Solanaceae species including tomoto, potato, peppper, eggplant, petunia, nicotiana.
  • CuGenDB – The web resource for the International Cucurbit Genomics Initiative including melon, cucumber, watermelen, pumpkin, etc. 
  • GDR – Genome Database for Rosaceae which provides data mining tools and publicly available genomics, genetics and breeding data for Rosaceae.
  • GoMapMan – Resource for gene functional annotations in the plant sciences.
  • NPACT – A curated database of plant derived natural compounds that exhibit anti-cancerous activity.
  • PGDD – A database used to identify and catalog plant genes in terms of intragenome or cross-genome syntenic relationships.
  • PIECE – A plant gene structure comparison and evolution database of 25 species
  • PlantRNA – Database for tRNA sequences of plants and algae.
  • PlnTFDB– Plant Transcription Factor Database provides putatively complete sets of transcription factors (TFs) and other transcriptional regulators  in  completely sequenced plants.
  • PMRD – Plant microRNA Database integrates publically available plant miRNA data.
  •  SALAD – Motif-based database of protein annotations for plant comparative genomics.

Model Organism Databases

  • MGI – International database resource for the laboratory mouse.
  • RGD – Rat Genome Database. Integrates genetic, genomic, phenotype, and disease-related data generated from rat research.
  • XenBase – Integrates all the diverse biological, genomic, genotype and phenotype data available from Xenopus research.
  • Zfin – ZFIN serves as the zebrafish model organism database.
  • FlyBase – Primary repository of Drosophila Genes & Genomes
  • OnTheFly – A database of Drosophila melanogaster transcription factor DNA binding specificities.
  • FlyAtlas – The Drosophila gene expression atlas.
  • WormBase – Integrates information concerning the genetics, genomics and biology of C. elegans and related nematodes
  • SGD – The Saccharomyces Genome Database
  • BDGP – Berkeley Drosophila Genome Project
  • BeeBase – Comprehensive sequence data source for the bee research community.
  • PomBase – A comprehensive database of Schizosaccharomyces pombe.
  • AtMAD – Arabidopsis thaliana Multi-omics Association Database.
  • ZInc – Database on zebrafish mutations.
  • OikoBase – A curated genome expression database of Oikopleura dioica.

Invertebrate Vectors of Human Pathogens Database

  • VectorBase – Database of Invertebrate Vectors of Human Pathogens. Includes reference and variant genome sequence, structural and functional annotations, and phenotypic and population data for traits such as insecticide resistance.

Disease-Specific Databases

  • AudGenDB – The Audiological and Genetic Database
  • EDKB – Endocrine Disruptor Knowledge Base
  • HGMD – The Human Gene Mutation Database
  • NIAID – National Institute of Allergy and Infectious Diseases
  • OMIM – Online Mendelian Inheritance in Man. An Online Catalog of Human Genes and Genetic Disorders
  • PC-GDB – The pancreatic cancer gene database. Latest information on genes causing pancreatic cancer.
  • Pancreatic Cancer Database – Resource of experimentally demonstrated molecular alterations associated with pancreatic cancer in cancer tissues or cancer cell lines.