Saturday, December 21, 2024

BIOINFORMATICS DATABASES & WEB-SERVERS

Best Biological Databases and Web Servers

The list includes a number of useful databases and web-servers used in bioinformatics and biology research.

Sequence Databases

Nucleotide Sequence Databases

  • Nucleotide@NCBI – Database of sequences from several sources, including GenBank, RefSeq, TPA and PDB.
  • ENA@EBI – European Nucleotide Archive comprehensive record of the world’s nucleotide sequencing information.
  • DDBJ – The nucleotide sequence database of Japan.

Protein Sequence Databases

  • PIR – Protein Information Resource is an integrated public bioinformatics resource to support genomic, proteomic and systems biology research.
  • Protein@NCBI – Database of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB. 
  • UniProt – Database of protein sequence and functional information.

Gene Databases

Gene Prediction Servers

  • Genscan – Identification of complete gene structures in genomic DNA.
  • GeneMark – Gene Prediction in Bacteria, Archaea, Metagenomes and Metatranscriptomes.
  • GENEID For predicting genes, exons, splice sites and other signals along a DNA sequence. 
  • AUGUSTUS – For predicting genes in eukaryotic genomic sequences.
  • EuGene – Integrative gene finder for eukaryotic and prokaryotic genomes.

Genome Databases and Browsers

  • ENSEMBL – Genome browser for vertebrate genomes.
  • UCSC Genome Browser – Integrates reference sequence and working draft assemblies for a large collection of genomes at the University of California at Santa Cruz.
  • Phytozome – Portal for plant comparative genomics  .
  • Gramene – Resource for comparative functional genomics in crops and model plant species.
  • NCBI Genome Data Viewer – A genome browser for exploration and analysis of eukaryotic RefSeq genome assemblies.
  • NCBI Genome – Organizes information on genomes including sequences, maps, chromosomes, assemblies, and annotations.
  • VISTA – A comprehensive suite of programs and databases for comparative analysis of genomic sequences.
  • GOLD – Genomes Online Database, is a World Wide Web resource for comprehensive access to information regarding genome and metagenome sequencing projects, and their associated metadata.
  • MITOMAP – A human mitochondrial genome database.

Genome Analysis

  • GeneCensus – Genome comparisons in terms of metabolic pathway activity and protein family sharing.
  • GWAS Catalog – The NHGRI-EBI Catalog of human genome-wide association studies.
  • UCSC Xena – An online exploration tool for public and private, multi-omic and clinical/phenotype data.

Must Read

FedPyDESeq2: Advancing Differential Expression Analysis with Federated Learning for Bulk RNA-Seq.

FedPyDESeq2: Advancing Differential Expression Analysis with Federated Learning for Bulk RNA-Seq

0
In genomics, the conventional approach for explaining any gene expression is to cross-analyze bulk RNA sequencing (RNA-seq) data. However, as researchers, more often than...
Decoding Protein Roles with ProCyon: A Unified Framework for Multiscale Phenotypes

Decoding Protein Roles with ProCyon: A Unified Framework for Multiscale Phenotypes

0
About 20% of human proteins lack recognized functionalities, and over 40% lack context-specific functionals, underscoring the difficulties of comprehending these proteins and their varied...

Streamlining Copy Number Variation Discovery: How CNV-Finder is Changing Genomics

0
A prevailing challenge in genomics still revolves around the comprehension of structural variants, one of which is copy number variation (CNV). To undertake the...
Unlocking Protein Design with PLAID: A Sequence-Centric Generative Model for All-Atom Structures

Unlocking Protein Design with PLAID: A Sequence-Centric Generative Model for All-Atom Structures

0
The potential influence of generative models for protein design is drawing the attention of the scientific world. However, there are numerous modalities that mediate...
The Nucleotide Transformer: How Foundation Models are Shaping Human Genomics

The Nucleotide Transformer: How Foundation Models are Shaping Human Genomics

0
The field of genomics is ever-changing as a result of an increase in automation and aspects of biological engineering. A groundbreaking study led by...

Gene Expression and Regulation Databases

Gene Expression Databases

Gene Regulation Databases

  • miRBase – The microRNA database is a searchable database of published miRNA sequences and annotation.
  • TRANSFAC – Provides data on eukaryotic transcription factors, their experimentally-proven binding sites, consensus binding sequences (positional weight matrices) and regulated genes.
  • DBTSS – Database of Transcriptional Start Sites.
  • ENCODE – A public research consortium aimed at identifying all functional elements in the human and mouse genomes.

Protein Structure Databases

Protein 3D Structure Databases

  • PDB – Protein Data Bank archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies.
  • Structure@NCBI – Protein 3D structure repository at NCBI.
  • PDBe@EBI – The EBI macromolecular structure database. 
  • PDBSum@EBI – The PDB summary database at EBI.
  • MMDB@NCBI – The macromolecular database maintained at NCBI.
  • BMRB – The biological magnetic resonance data bank.
  • SCOP – Structural Classification of Proteins aims to provide a comprehensive description of the structural and evolutionary relationships between all known proteins structures.
  • CATH – The database of Calcification, Architecture, Topology and Homologous superfamily.

Databases of protein domain, function, expression and family

Protein Domain Databases

  • InterPro – A resource that provides functional analysis of protein sequences.
  • CDD – A database of conserved protein domains.
  • ProDom – A database of comprehensive set of protein domain families automatically generated from the UniProt knowledge database.
  • SMART – Simple Modular Architecture Research Tool. It allows the identification and annotation of genetically mobile domains and the analysis of domain architectures.
  • HPA – The human protein atlas shows expression and localization of proteins in a large variety of normal human tissues, cancer cells and cell lines with the aid of immunohistochemistry.

Protein Family Databases

  • PFam – A large collection of protein families.
  • PROSITE – A database of protein families and domains.
  • RFam – Database of RNA families, represented by multiple sequence alignments,consensus secondary structures and covariance models.
  • DFam – Database of Transposable Element DNA sequence alignments, hidden Markov Models (HMMs), consensus sequences, and genome annotations.
  • TreeFam – Database composed of phylogenetic trees inferred from animal genomes.

Interaction and Pathway databases

Protein Interaction Databases

  • STRING@EMBL – A web server for protein-protein interaction.
  • BioGRID – Database of Protein, Genetic and Chemical Interactions
  • STITCH@EMBL – A web server for chemical-protein interaction.
  • REACTOME – An open-source, open access, manually curated and peer-reviewed pathway database.
  • DAVID – Database for Annotation, Visualization and Integrated Discovery

Pathway Databases

  • KEGG – A collection of manually drawn pathway maps.
  • PathGuide – A meta-database that provides an overview of more than 190 web-accessible biological pathway and network databases.
  • Pathway Commons – A collection of publicly available pathway information from multiple organisms.
  • PhosphoSitePlus – A comprehensive information and tools for the study of protein post-translational modifications.
  • METscout – Database brings together metabolism and gene expression landscapes.

Metabolite Databases

Metabolite Databases

Specialized Databases

Bacterial Genome Databases

  • PATRIC – The Pathosystems Resource Integration Center provides integrated data and analysis tools to support biomedical research on bacterial infectious diseases.
  • BacDive – The Bacterial Diversity Metadatabase is the world’s largest database for standardized bacterial information.

Virus Genome Databases

  • Viral Genomes – Viral genome information resource at NCBI.
  • GISAID – Global Initiative on Sharing Avian Influenza Data.
  • NCBI Flu – Influenza Virus Resource with influenza genomic data and analysis tools.
  • Plant Viruses – This site provides a central source of information about viruses, viroids and satellites of plants, fungi and protozoa.

Microbial Databases

ECMDB – E. coli Metabolome Database of small molecular metabolites found or produced by Escherichia coli (strain K12, MG1655). 

IMG – Integrated Microbial Genomes system serves as a resource for analysis and annotation of genome and metagenome datasets in a comprehensive comparative context. 

LoQAtE – The localization and quantititation atlas of the yeast proteome.

Plant Databases

  • PlantTFDB – The database of plant transcription factors.
  • TAIR – The Arabidopsis Information Resource (TAIR) is a database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana.
  • AraPort – Araport is a web-server for Arabidopsis thaliana genomics.
  • IC4R – A curated database providing rice genome sequences, updating rice gene annotations and integrating multiple omics data through community-contributed modules. 
  • Oryzabase – A comprehensive rice science database.
  • MaizeGDB – Maize Genetics and Genomics Database
  • SoyBase – Integrating Genetics and Molecular Biology for Soybean Researchers.
  • SGN – Solanaceae Genomics Network is a data resource of the Solanaceae species including tomoto, potato, peppper, eggplant, petunia, nicotiana.
  • CuGenDB – The web resource for the International Cucurbit Genomics Initiative including melon, cucumber, watermelen, pumpkin, etc. 
  • GDR – Genome Database for Rosaceae which provides data mining tools and publicly available genomics, genetics and breeding data for Rosaceae.
  • GoMapMan – Resource for gene functional annotations in the plant sciences.
  • NPACT – A curated database of plant derived natural compounds that exhibit anti-cancerous activity.
  • PGDD – A database used to identify and catalog plant genes in terms of intragenome or cross-genome syntenic relationships.
  • PIECE – A plant gene structure comparison and evolution database of 25 species
  • PlantRNA – Database for tRNA sequences of plants and algae.
  • PlnTFDB– Plant Transcription Factor Database provides putatively complete sets of transcription factors (TFs) and other transcriptional regulators  in  completely sequenced plants.
  • PMRD – Plant microRNA Database integrates publically available plant miRNA data.
  •  SALAD – Motif-based database of protein annotations for plant comparative genomics.

Model Organism Databases

  • MGI – International database resource for the laboratory mouse.
  • RGD – Rat Genome Database. Integrates genetic, genomic, phenotype, and disease-related data generated from rat research.
  • XenBase – Integrates all the diverse biological, genomic, genotype and phenotype data available from Xenopus research.
  • Zfin – ZFIN serves as the zebrafish model organism database.
  • FlyBase – Primary repository of Drosophila Genes & Genomes
  • OnTheFly – A database of Drosophila melanogaster transcription factor DNA binding specificities.
  • FlyAtlas – The Drosophila gene expression atlas.
  • WormBase – Integrates information concerning the genetics, genomics and biology of C. elegans and related nematodes
  • SGD – The Saccharomyces Genome Database
  • BDGP – Berkeley Drosophila Genome Project
  • BeeBase – Comprehensive sequence data source for the bee research community.
  • PomBase – A comprehensive database of Schizosaccharomyces pombe.
  • AtMAD – Arabidopsis thaliana Multi-omics Association Database.
  • ZInc – Database on zebrafish mutations.
  • OikoBase – A curated genome expression database of Oikopleura dioica.

Invertebrate Vectors of Human Pathogens Database

  • VectorBase – Database of Invertebrate Vectors of Human Pathogens. Includes reference and variant genome sequence, structural and functional annotations, and phenotypic and population data for traits such as insecticide resistance.

Disease-Specific Databases

  • AudGenDB – The Audiological and Genetic Database
  • EDKB – Endocrine Disruptor Knowledge Base
  • HGMD – The Human Gene Mutation Database
  • NIAID – National Institute of Allergy and Infectious Diseases
  • OMIM – Online Mendelian Inheritance in Man. An Online Catalog of Human Genes and Genetic Disorders
  • PC-GDB – The pancreatic cancer gene database. Latest information on genes causing pancreatic cancer.
  • Pancreatic Cancer Database – Resource of experimentally demonstrated molecular alterations associated with pancreatic cancer in cancer tissues or cancer cell lines.