The list includes a number of useful databases and web-servers used in bioinformatics and biology research.
Nucleotide Sequence Databases
Protein Sequence Databases
PIR – Protein Information Resource is an integrated public bioinformatics resource to support genomic, proteomic and systems biology research.
Protein@NCBI – Database of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB.
UniProt – Database of protein sequence and functional information.
Gene Prediction Servers
- Genscan – Identification of complete gene structures in genomic DNA.
- GeneMark – Gene Prediction in Bacteria, Archaea, Metagenomes and Metatranscriptomes.
- GENEID – For predicting genes, exons, splice sites and other signals along a DNA sequence.
- AUGUSTUS – For predicting genes in eukaryotic genomic sequences.
- EuGene – Integrative gene finder for eukaryotic and prokaryotic genomes.
Genome Databases and Browsers
- ENSEMBL – Genome browser for vertebrate genomes.
- UCSC Genome Browser – Integrates reference sequence and working draft assemblies for a large collection of genomes at the University of California at Santa Cruz.
- Phytozome – Portal for plant comparative genomics .
- Gramene – Resource for comparative functional genomics in crops and model plant species.
- NCBI Genome Data Viewer – A genome browser for exploration and analysis of eukaryotic RefSeq genome assemblies.
- NCBI Genome – Organizes information on genomes including sequences, maps, chromosomes, assemblies, and annotations.
- VISTA – A comprehensive suite of programs and databases for comparative analysis of genomic sequences.
- GOLD – Genomes Online Database, is a World Wide Web resource for comprehensive access to information regarding genome and metagenome sequencing projects, and their associated metadata.
- MITOMAP – A human mitochondrial genome database.
New AI Tool ‘SigProfilerExtractor’ Identifies Mutational Signature Linked To Tobacco Smoking and Bladder Cancer
Scientists Investigated Microbial Communities and Resistomes in Relation to Interconnected Humans, Soil, and Livestock
Gene Expression and Regulation Databases
Gene Expression Databases
Gene Regulation Databases
miRBase – The microRNA database is a searchable database of published miRNA sequences and annotation.
TRANSFAC – Provides data on eukaryotic transcription factors, their experimentally-proven binding sites, consensus binding sequences (positional weight matrices) and regulated genes.
- DBTSS – Database of Transcriptional Start Sites.
ENCODE – A public research consortium aimed at identifying all functional elements in the human and mouse genomes.
Protein Structure Databases
Protein 3D Structure Databases
- PDB – Protein Data Bank archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies.
- Structure@NCBI – Protein 3D structure repository at NCBI.
- PDBe@EBI – The EBI macromolecular structure database.
- PDBSum@EBI – The PDB summary database at EBI.
- MMDB@NCBI – The macromolecular database maintained at NCBI.
- BMRB – The biological magnetic resonance data bank.
- SCOP – Structural Classification of Proteins aims to provide a comprehensive description of the structural and evolutionary relationships between all known proteins structures.
- CATH – The database of Calcification, Architecture, Topology and Homologous superfamily.
Databases of protein domain, function, expression and family
Protein Domain Databases
- InterPro – A resource that provides functional analysis of protein sequences.
- CDD – A database of conserved protein domains.
- ProDom – A database of comprehensive set of protein domain families automatically generated from the UniProt knowledge database.
- SMART – Simple Modular Architecture Research Tool. It allows the identification and annotation of genetically mobile domains and the analysis of domain architectures.
- HPA – The human protein atlas shows expression and localization of proteins in a large variety of normal human tissues, cancer cells and cell lines with the aid of immunohistochemistry.
Protein Family Databases
- PFam – A large collection of protein families.
- PROSITE – A database of protein families and domains.
- RFam – Database of RNA families, represented by multiple sequence alignments,consensus secondary structures and covariance models.
- DFam – Database of Transposable Element DNA sequence alignments, hidden Markov Models (HMMs), consensus sequences, and genome annotations.
- TreeFam – Database composed of phylogenetic trees inferred from animal genomes.
Interaction and Pathway databases
Protein Interaction Databases
- STRING@EMBL – A web server for protein-protein interaction.
- BioGRID – Database of Protein, Genetic and Chemical Interactions
- STITCH@EMBL – A web server for chemical-protein interaction.
- REACTOME – An open-source, open access, manually curated and peer-reviewed pathway database.
- DAVID – Database for Annotation, Visualization and Integrated Discovery
- KEGG – A collection of manually drawn pathway maps.
- PathGuide – A meta-database that provides an overview of more than 190 web-accessible biological pathway and network databases.
- Pathway Commons – A collection of publicly available pathway information from multiple organisms.
- PhosphoSitePlus – A comprehensive information and tools for the study of protein post-translational modifications.
- METscout – Database brings together metabolism and gene expression landscapes.
Bacterial Genome Databases
Virus Genome Databases
- Viral Genomes – Viral genome information resource at NCBI.
- GISAID – Global Initiative on Sharing Avian Influenza Data.
- NCBI Flu – Influenza Virus Resource with influenza genomic data and analysis tools.
- Plant Viruses – This site provides a central source of information about viruses, viroids and satellites of plants, fungi and protozoa.
ECMDB – E. coli Metabolome Database of small molecular metabolites found or produced by Escherichia coli (strain K12, MG1655).
IMG – Integrated Microbial Genomes system serves as a resource for analysis and annotation of genome and metagenome datasets in a comprehensive comparative context.
LoQAtE – The localization and quantititation atlas of the yeast proteome.
- PlantTFDB – The database of plant transcription factors.
- TAIR – The Arabidopsis Information Resource (TAIR) is a database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana.
- AraPort – Araport is a web-server for Arabidopsis thaliana genomics.
- IC4R – A curated database providing rice genome sequences, updating rice gene annotations and integrating multiple omics data through community-contributed modules.
- Oryzabase – A comprehensive rice science database.
- MaizeGDB – Maize Genetics and Genomics Database
- SoyBase – Integrating Genetics and Molecular Biology for Soybean Researchers.
- SGN – Solanaceae Genomics Network is a data resource of the Solanaceae species including tomoto, potato, peppper, eggplant, petunia, nicotiana.
- CuGenDB – The web resource for the International Cucurbit Genomics Initiative including melon, cucumber, watermelen, pumpkin, etc.
- GDR – Genome Database for Rosaceae which provides data mining tools and publicly available genomics, genetics and breeding data for Rosaceae.
- GoMapMan – Resource for gene functional annotations in the plant sciences.
- NPACT – A curated database of plant derived natural compounds that exhibit anti-cancerous activity.
- PGDD – A database used to identify and catalog plant genes in terms of intragenome or cross-genome syntenic relationships.
- PIECE – A plant gene structure comparison and evolution database of 25 species
- PlantRNA – Database for tRNA sequences of plants and algae.
- PlnTFDB– Plant Transcription Factor Database provides putatively complete sets of transcription factors (TFs) and other transcriptional regulators in completely sequenced plants.
- PMRD – Plant microRNA Database integrates publically available plant miRNA data.
- SALAD – Motif-based database of protein annotations for plant comparative genomics.
Model Organism Databases
- MGI – International database resource for the laboratory mouse.
- RGD – Rat Genome Database. Integrates genetic, genomic, phenotype, and disease-related data generated from rat research.
- XenBase – Integrates all the diverse biological, genomic, genotype and phenotype data available from Xenopus research.
- Zfin – ZFIN serves as the zebrafish model organism database.
- FlyBase – Primary repository of Drosophila Genes & Genomes
- OnTheFly – A database of Drosophila melanogaster transcription factor DNA binding specificities.
- FlyAtlas – The Drosophila gene expression atlas.
- WormBase – Integrates information concerning the genetics, genomics and biology of C. elegans and related nematodes
- SGD – The Saccharomyces Genome Database
- BDGP – Berkeley Drosophila Genome Project
- BeeBase – Comprehensive sequence data source for the bee research community.
- PomBase – A comprehensive database of Schizosaccharomyces pombe.
- AtMAD – Arabidopsis thaliana Multi-omics Association Database.
- ZInc – Database on zebrafish mutations.
- OikoBase – A curated genome expression database of Oikopleura dioica.
Invertebrate Vectors of Human Pathogens Database
- VectorBase – Database of Invertebrate Vectors of Human Pathogens. Includes reference and variant genome sequence, structural and functional annotations, and phenotypic and population data for traits such as insecticide resistance.
- AudGenDB – The Audiological and Genetic Database
- EDKB – Endocrine Disruptor Knowledge Base
- HGMD – The Human Gene Mutation Database
- NIAID – National Institute of Allergy and Infectious Diseases
- OMIM – Online Mendelian Inheritance in Man. An Online Catalog of Human Genes and Genetic Disorders
- PC-GDB – The pancreatic cancer gene database. Latest information on genes causing pancreatic cancer.
- Pancreatic Cancer Database – Resource of experimentally demonstrated molecular alterations associated with pancreatic cancer in cancer tissues or cancer cell lines.