The list includes a number of useful databases and web-servers used in bioinformatics and biology research.
Sequence Databases
Nucleotide Sequence Databases
- Nucleotide@NCBI – Database of sequences from several sources, including GenBank, RefSeq, TPA and PDB.
-
ENA@EBI – European Nucleotide Archive comprehensive record of the world’s nucleotide sequencing information.
- DDBJ – The nucleotide sequence database of Japan.
Protein Sequence Databases
-
PIR – Protein Information Resource is an integrated public bioinformatics resource to support genomic, proteomic and systems biology research.
-
Protein@NCBI – Database of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB.
-
UniProt – Database of protein sequence and functional information.
Gene Databases
-
Entrez Gene – Integrates information from a wide range of species.
-
GeneCards – An integrative database of all annotated and predicted human genes.
Gene Prediction Servers
- Genscan – Identification of complete gene structures in genomic DNA.
- GeneMark – Gene Prediction in Bacteria, Archaea, Metagenomes and Metatranscriptomes.
- GENEID – For predicting genes, exons, splice sites and other signals along a DNA sequence.
- AUGUSTUS – For predicting genes in eukaryotic genomic sequences.
- EuGene – Integrative gene finder for eukaryotic and prokaryotic genomes.
Genome Databases and Browsers
- ENSEMBL – Genome browser for vertebrate genomes.
- UCSC Genome Browser – Integrates reference sequence and working draft assemblies for a large collection of genomes at the University of California at Santa Cruz.
- Phytozome – Portal for plant comparative genomics .
- Gramene – Resource for comparative functional genomics in crops and model plant species.
- NCBI Genome Data Viewer – A genome browser for exploration and analysis of eukaryotic RefSeq genome assemblies.
- NCBI Genome – Organizes information on genomes including sequences, maps, chromosomes, assemblies, and annotations.
- VISTA – A comprehensive suite of programs and databases for comparative analysis of genomic sequences.
- GOLD – Genomes Online Database, is a World Wide Web resource for comprehensive access to information regarding genome and metagenome sequencing projects, and their associated metadata.
- MITOMAP – A human mitochondrial genome database.
Genome Analysis
- GeneCensus – Genome comparisons in terms of metabolic pathway activity and protein family sharing.
- GWAS Catalog – The NHGRI-EBI Catalog of human genome-wide association studies.
- UCSC Xena – An online exploration tool for public and private, multi-omic and clinical/phenotype data.
Must Read
How BALM is Redefining Binding Affinity Prediction for Unseen Targets and Drugs
EPBDxDNABERT-2: A Multi-Modal Deep Learning Model for Enhanced Transcription Factor DNA Binding Prediction
Omega-3 and Omega-6 Fatty Acids: New Allies in Cancer Prevention
The Future of Bioinformatics: Building Expertise for Data-Driven Discovery
The Power of Foundation Models in Pathology: How Paige’s Virchow is Changing Cancer Detection
Gene Expression and Regulation Databases
Gene Expression Databases
-
GENT2 – Gene expression database for normal and tumor tissues.
-
GEO@NCBI – The Gene Expression Omnibus repository contains individual gene expression profiles from curated DataSets.
- Allen Brain Atlas – Gene expression and neuroanatomical data.
- TCGA – The Cancer Genome Atlas provides tools for visualizing, querying and downloading the data released quarterly by the consortium’s member projects.
- Cell Miner – A database and query tool designed for cancer research.
- Expression Atlas – Provides information about gene and protein expression.
Gene Regulation Databases
-
miRBase – The microRNA database is a searchable database of published miRNA sequences and annotation.
-
TRANSFAC – Provides data on eukaryotic transcription factors, their experimentally-proven binding sites, consensus binding sequences (positional weight matrices) and regulated genes.
- DBTSS – Database of Transcriptional Start Sites.
-
ENCODE – A public research consortium aimed at identifying all functional elements in the human and mouse genomes.
Protein Structure Databases
Protein 3D Structure Databases
- PDB – Protein Data Bank archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies.
- Structure@NCBI – Protein 3D structure repository at NCBI.
- PDBe@EBI – The EBI macromolecular structure database.
- PDBSum@EBI – The PDB summary database at EBI.
- MMDB@NCBI – The macromolecular database maintained at NCBI.
- BMRB – The biological magnetic resonance data bank.
- SCOP – Structural Classification of Proteins aims to provide a comprehensive description of the structural and evolutionary relationships between all known proteins structures.
- CATH – The database of Calcification, Architecture, Topology and Homologous superfamily.
Databases of protein domain, function, expression and family
Protein Domain Databases
- InterPro – A resource that provides functional analysis of protein sequences.
- CDD – A database of conserved protein domains.
- ProDom – A database of comprehensive set of protein domain families automatically generated from the UniProt knowledge database.
- SMART – Simple Modular Architecture Research Tool. It allows the identification and annotation of genetically mobile domains and the analysis of domain architectures.
- HPA – The human protein atlas shows expression and localization of proteins in a large variety of normal human tissues, cancer cells and cell lines with the aid of immunohistochemistry.
Protein Family Databases
- PFam – A large collection of protein families.
- PROSITE – A database of protein families and domains.
- RFam – Database of RNA families, represented by multiple sequence alignments,consensus secondary structures and covariance models.
- DFam – Database of Transposable Element DNA sequence alignments, hidden Markov Models (HMMs), consensus sequences, and genome annotations.
- TreeFam – Database composed of phylogenetic trees inferred from animal genomes.
Interaction and Pathway databases
Protein Interaction Databases
- STRING@EMBL – A web server for protein-protein interaction.
- BioGRID – Database of Protein, Genetic and Chemical Interactions
- STITCH@EMBL – A web server for chemical-protein interaction.
- REACTOME – An open-source, open access, manually curated and peer-reviewed pathway database.
- DAVID – Database for Annotation, Visualization and Integrated Discovery
Pathway Databases
- KEGG – A collection of manually drawn pathway maps.
- PathGuide – A meta-database that provides an overview of more than 190 web-accessible biological pathway and network databases.
- Pathway Commons – A collection of publicly available pathway information from multiple organisms.
- PhosphoSitePlus – A comprehensive information and tools for the study of protein post-translational modifications.
- METscout – Database brings together metabolism and gene expression landscapes.
Metabolite Databases
- HMDB – Human Metabolome Database.
- KEGG LIGAND Database – Database for universe of chemical substances and reactions that are relevant to life.
- KNApSAcK – A Comprehensive Species-Metabolite Relationship Database.
- LIPID MAPS – LIPID Metabolites And Pathways Strategy. Provide access to lipid nomenclature, databases, tools, protocols, standards, tutorials, meetings, publications, and other resources.
- MassBank – High Quality Mass Spectral Database.
- MetaCyc – It is a curated database of experimentally elucidated metabolic pathways from all domains of life.
- METLIN – A repository of metabolite information and tandem mass spectrometry data designed to facilitate metabolite identification in metabolomics.
Specialized Databases
Bacterial Genome Databases
Virus Genome Databases
- Viral Genomes – Viral genome information resource at NCBI.
- GISAID – Global Initiative on Sharing Avian Influenza Data.
- NCBI Flu – Influenza Virus Resource with influenza genomic data and analysis tools.
- Plant Viruses – This site provides a central source of information about viruses, viroids and satellites of plants, fungi and protozoa.
Microbial Databases
ECMDB – E. coli Metabolome Database of small molecular metabolites found or produced by Escherichia coli (strain K12, MG1655).
IMG – Integrated Microbial Genomes system serves as a resource for analysis and annotation of genome and metagenome datasets in a comprehensive comparative context.
LoQAtE – The localization and quantititation atlas of the yeast proteome.
Plant Databases
- PlantTFDB – The database of plant transcription factors.
- TAIR – The Arabidopsis Information Resource (TAIR) is a database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana.
- AraPort – Araport is a web-server for Arabidopsis thaliana genomics.
- IC4R – A curated database providing rice genome sequences, updating rice gene annotations and integrating multiple omics data through community-contributed modules.
- Oryzabase – A comprehensive rice science database.
- MaizeGDB – Maize Genetics and Genomics Database
- SoyBase – Integrating Genetics and Molecular Biology for Soybean Researchers.
- SGN – Solanaceae Genomics Network is a data resource of the Solanaceae species including tomoto, potato, peppper, eggplant, petunia, nicotiana.
- CuGenDB – The web resource for the International Cucurbit Genomics Initiative including melon, cucumber, watermelen, pumpkin, etc.
- GDR – Genome Database for Rosaceae which provides data mining tools and publicly available genomics, genetics and breeding data for Rosaceae.
- GoMapMan – Resource for gene functional annotations in the plant sciences.
- NPACT – A curated database of plant derived natural compounds that exhibit anti-cancerous activity.
- PGDD – A database used to identify and catalog plant genes in terms of intragenome or cross-genome syntenic relationships.
- PIECE – A plant gene structure comparison and evolution database of 25 species
- PlantRNA – Database for tRNA sequences of plants and algae.
- PlnTFDB– Plant Transcription Factor Database provides putatively complete sets of transcription factors (TFs) and other transcriptional regulators in completely sequenced plants.
- PMRD – Plant microRNA Database integrates publically available plant miRNA data.
- SALAD – Motif-based database of protein annotations for plant comparative genomics.
Model Organism Databases
- MGI – International database resource for the laboratory mouse.
- RGD – Rat Genome Database. Integrates genetic, genomic, phenotype, and disease-related data generated from rat research.
- XenBase – Integrates all the diverse biological, genomic, genotype and phenotype data available from Xenopus research.
- Zfin – ZFIN serves as the zebrafish model organism database.
- FlyBase – Primary repository of Drosophila Genes & Genomes
- OnTheFly – A database of Drosophila melanogaster transcription factor DNA binding specificities.
- FlyAtlas – The Drosophila gene expression atlas.
- WormBase – Integrates information concerning the genetics, genomics and biology of C. elegans and related nematodes
- SGD – The Saccharomyces Genome Database
- BDGP – Berkeley Drosophila Genome Project
- BeeBase – Comprehensive sequence data source for the bee research community.
- PomBase – A comprehensive database of Schizosaccharomyces pombe.
- AtMAD – Arabidopsis thaliana Multi-omics Association Database.
- ZInc – Database on zebrafish mutations.
- OikoBase – A curated genome expression database of Oikopleura dioica.
Invertebrate Vectors of Human Pathogens Database
- VectorBase – Database of Invertebrate Vectors of Human Pathogens. Includes reference and variant genome sequence, structural and functional annotations, and phenotypic and population data for traits such as insecticide resistance.
Disease-Specific Databases
- AudGenDB – The Audiological and Genetic Database
- EDKB – Endocrine Disruptor Knowledge Base
- HGMD – The Human Gene Mutation Database
- NIAID – National Institute of Allergy and Infectious Diseases
- OMIM – Online Mendelian Inheritance in Man. An Online Catalog of Human Genes and Genetic Disorders
- PC-GDB – The pancreatic cancer gene database. Latest information on genes causing pancreatic cancer.
- Pancreatic Cancer Database – Resource of experimentally demonstrated molecular alterations associated with pancreatic cancer in cancer tissues or cancer cell lines.