Database Catalog

Database Name URL Database Description
ABS database contains 650 annotated transcription factor binding sites and promoter sequences (500 bps) in which the ABS binding sites have been mapped.

It also contains computational predictions on each promoter of the ABS database from: JASPAR, PROMO, TRANSFAC
ACTIVITY ACTIVITY is a database on DNA site sequences with known activity magnitudes, measurement systems and sequence-activity relationships under fixed experimental conditions and is additionally adapted to applications to the phylogenetic footprints of known sites. It consists of three resources, ACTIVITY_Reports, ACTIVITY_Tuning, and TFsite_Annotations. For a given site with known sequence-activity relationships, the first database ACTIVITY_Reports accumulates the quantitative data on the impact of the site surrounding, which allows for correct recognition of this site. From this way characterized site-surrounding relationships, the programm aimed at precise analysis of phylogenetic footprints of only this known site is automatically generated and stored within the knowledge base ACTIVITY_Tuning. The resulting database TFsite_Annotations documents the programm generated results obtained in the cases of only putative sites of the same type and the same location within the same regulatory region of the homologous gene. The resources enriching the current ACTIVITY release are available at URL=
Agris AtcisDB consists of 25,516 promoter sequences of annotated Arabidopsis genes with a description of putative cis-regulatory elements.

AtTFDB contains information on approximately 1,770 transcription factors. These transcription factors are grouped into 50 families, based on the presence of conserved DNA-binding domains.
ASPD Artificial selected proteins/peptides database
BIND Peer-reviewed biomolecular database containing over 200,000 published interactions and complexes for more than 60,000 unique gene identifiers. The growing data in BINDplus contain over 1,500 unique organisms, and 7,555 Gene Ontology terms derived from peer-reviewed scientific data extracted from over 23,800 journal articles and over 9000 corresponding authors
Cancer Chromosomes    
cisRED Databases of genome-wide regulatory module and element predictions on:
Human 9
Mouse 4
Mouse 3.1
Rat 1.1
C.elegans 4
Human Stat1 ChIP-seq peaks 1
CMGSDB CMGSDB (Computational Models for Gene Silencing: Elucidating a pervasive biologcal defensive response), is a database whose objective is to investigate gene silencing from a computational perspective using tools of computational biology and bioinformatics.
Gavin   Co-immunoprecipitation data (Gavin, A. C. et al. (2002) "Functional organization of the yeast proteome by systematic analysis of protein complexes". Nature, 415, 141147.)
Ho   Co-immunoprecipitation data (Ho, Y. et al. (2002) "Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry". Nature, 415, 180183.)
CoryneRegNet Transfer of gene regulatory networks from the model organism: C. glutamicum to C. efficiens, C. diphtheriae, and C. jeikeium
COXPRESdb Co-expressed genes
Protein sub-cellular localizations predicted by WoLF PSORT
CSH data A database of yeast promoters conatins promoter regions of ~6000 genes and ORFs in yeast genome.
CTCF Binding Site Database Experimentally identified binding sites.
Transcription factor prediction database.
DBTBS Transcription Factors
Regulated Genes (Operons)
DBTSS DBTSS is a collection of transcriptional start sites and adjacent promoters, which are experimentally determined by intensive analyses of full-length cDNAs. In order to extract biological insight from the compiled sequence information, search engines for putative transcription factor binding sites are implemented. Also, for molecular evolutionary studies of the transcriptional regulations, detailed sequence alignments of the promoters between human, mouse and other model organisms are provided. DBTSS is available on the web in Japan at, in Germany at and in Poland at The positional information of the TSSs, sequences of the promoters and related information can also be downloaded in flatfile form from the download site. The current release of DBTSS (5.1) contains TSS information of 15262 and 14162 genes determined by 1.4 and 0.4 million cDNAs in humans and mice, respectively.
DoOP DoOP ( is a database of eukaryotic promoter sequences (upstream regions) aiming to facilitate the recognition of regulatory sites conserved between species. The annotated first exons of human and Arabidopsis thaliana genes were used as queries in BLAST searches to collect the most closely related orthologous first exon sequences from Chordata and Viridiplantae species. Up to 3000 bp DNA segments upstream from these first exons constitute the clusters in the chordate and plant sections of the Database of Orthologous Promoters. Release 1.0 of DoOP contains 21061 chordate clusters from 284 different species and 7548 plant clusters from 269 different species. The database can be used to find and retrieve promoter sequences of a given gene from various species and it is also suitable to see the most trivial conserved sequence blocks in the orthologous upstream regions. In addition to the sequence data, the positions of the conserved sequence blocks derived from multiple alignments, the positions of repetitive elements and the positions of transcription start sites known from the Eukaryotic Promoter Database (EPD) can be viewed.
DPInteract Binding sites for E. coli DNA-binding proteins
ECRbase The database also contains a collection of annotated transcription factor binding sites in evolutionary conserved and promoter elements. ECRbase currently includes human, rhesus macaque, dog, opossum, rat, mouse, chicken, frog, zebrafish, and Fugu genomes.
EMBL Protein AC    
Ensemble ID    
EPD The Eukaryotic Promoter Database is an annotated non-redundant collection of eukaryotic POL II promoters, for which the transcription start site has been determined experimentally. Access to promoter sequences is provided by pointers to positions in nucleotide sequence entries. The annotation part of an entry includes description of the initiation site mapping data, cross-references to other databases, and bibliographic references.
FlyTF This database contains information on the manual curation of 1052 FlyBase identifiers, which are putative site-specific transcription factors
GeneNet The current SRS release of the GeneNet database contains 37 graphical maps of gene networks, as well as descriptions of 1766 proteins, 1006 genes, 241 small molecules, and 3254 relationships between gene network units, and 552 kinetic constants. Information distributed between 16 interlinked tables was obtained by annotating 1980 journal publications.
Tong2001   Genetic interaction data (Tong, A. H. Y. et al. (2001) "Systematic genetic analysis with ordered arrays of yeast deletion mutants". Science, 294, 23642368.)
Tong2004   Genetic interaction data (Tong, A.H. et al. "Global mapping of the yeast genetic interaction network". Science 303, 808-13 (2004).)
GenomeTraFaC is a database of conserved regulatory elements obtained by systematically analyzing the orthologous set of human and mouse genes. It mainly focuses on all of the high-quality mRNA entries of mouse and human genes in the Reference Sequence (RefSeq) database of the NCBI.
Greglist Greglist, is a database listing potential G-quadruplex regulated genes. G-rich DNA sequences can form G-quadruplexes, a four-stranded structure that is stabilized by planar arrays of four guanines associated with hydrogen bonds. Promoter G-quadruplexes have emerged as a new way to regulate gene transcription, such as in c-MYC expression. Further, G-quadruplex motifs are highly enriched in gene promoter regions in humans and other mammals. Greglist contains genes whose promoter regions have G-quadruplex motifs, and these genes are highly likely to be regulated by G-quadruplexes.
htpselex HTPSELEX database contains sets of in-vitro selected transcription factor binding site sequences obtained with SELEX and high-throughput SELEX method. The database hosts 12 individual Selex libraries for the transcription factors CTF/NF1 and LEF/TCF families totaling more than 40,000 sites. In addition we also have manually curated SELEX datasets from the literature for 25 different transcription factors
JASPAR Collection of PSSMs for transcription factor DNA-binding sites.
Drosophila melanogaster 5' mRNA transcription start site database
MAPPER   Putative transcription factor binding sites in various genomes
MIPS data  
MIT data Lee, T. et al. (2002)"Transcriptional regulatory networks in Saccharomyces cerevisiae". Science, 298, 799804.
MPromDB Mammalian promoter databas
NCBI Taxonomy    
ODB - Operon database ODB (Operon DataBase) is a database of known operons among the many complete genomes. Additionally, putative operons that are conserved in terms of known operons are also provided. The first release of ODB conteins about 2000 known operons and 13,000 putative operons in more than 200 genomes.
ooTFD ooTFD (1) is a database of transcription factors maintained in object-oriented and object-relational database systems. There are, at the time of this writing, about 7500 TF binding sites entries in this database, from both prokaryotic and eukaryotic sources, as well as roughly 500 TF binding site matrices. A number of tools and services are available for commonly performed sequence analyses against these datasets, as well as for the performing of ooTfd database queries are provided at the IFTI-MIRAGE web site. Matrix entries in this database are given a quality score according to the statistical methodology described by Rahmann (2), which may be utilized in the interpretation of matrix-based TF binding site searches. Entries in this database resource are linked, either through hard links based on published experimental results, or through precomputed datasets, to entries in EPD (3), PKR (4), and PDB (5).
The Open REGulatory ANNOtation database (ORegAnno) is an open database for the curation of known regulatory elements from scientific literature. Annotation is collected from users worldwide for various biological assays and is automatically cross-referenced against PubMED, Entrez Gene, EnsEMBL, dbSNP, the eVOC: Cell type ontology, and the Taxonomy database, where appropriate, with information regarding the original experimentation performed (evidence). ORegAnno further provides an open validation process for all regulatory annotation in the public domain. Assigned validators receive notification of new records in the database and are able to cross-reference the citation to ensure record integrity. Validators have the ability to modify any record (deprecating the old record and creating a new one) if an error is found.
Osteo-Promoter Database Genes involved in osteogenic proliferation and differentiation
PAZAR Transcription factors and regulatory sequence annotations
PLACE   PLACE database now contains 380 entries of cis-element motifs found in plant genes.
Plant Stress-Responsive Gene Catalog Stress-responsive gene in various plant species
PlantCARE is a database of plant cis-acting regulatory elements, enhancers and repressors. Regulatory elements are represented by positional matrices, consensus sequences and individual sites on particular promoter sequences. Links to the EMBL, TRANSFAC (Wingender, E. et al.) and MEDLINE databases are provided where available. Data about the transcription sites are mainly extracted from the literature, added with an increasing number of in silico predicted data. Apart from a general description for specific transcription factor (TF) sites, levels of confidence for the experimental evidence, functional information and the position on the promoter are also given. At present, collected are 668 cis-acting regulatory elements.
Plant promoter sequences
PPDB: Plant
Promoter Database
  Plant promoter database
pre-BIND database  
The PReMod database describes more than 100,000 computational predicted transcriptional regulatory modules within the human genome. These modules represent the regulatory potential for 229 transcription factors families and are the first genome-wide/transcription factor-wide collection of predicted regulatory modules for the human genome.

The algorithm used involves two steps: (i) Identification and scoring of putative transcription factor binding sites using 481 TRANSFAC 7.2 PWMs for vertebrate transcription factors. To this end, each non-coding position of the human genome was evaluated for its similarity to each PWM using a log-likelihood ratio score with a local GC-parameterized third-order Markov background model. Corresponding orthologous positions in mouse and rat genomes were evaluated similarly and a weighted average of the human, mouse, and rat log-likelihood scores at aligned positions (based on a Multiz (Blanchette et al. 2004) genome-wide alignment of these three species) was used to define the matrix score for each genomic position and each PWM. (ii) Detection of clustered putative binding sites. To assign a "module score" to a given region, the five transcription factors with the highest total scoring hits are identified, and a p-value is assigned to the total score observed of the top 1, 2, 3, 4, or 5 factors. The p-value computation takes into consideration the number of factors involved (1 to 5), their total binding site scores, and the length and GC content of the region under evaluation2.
PRODORIC   Prokaryotic database of gene regulation and regulatory networks
PromEC PromEC is an updated compilation of E. coli mRNA promoter sequences. It includes documentation on the location of experimentally identified mRNA transcriptional start sites on the E. coli chromosome, as well as the actual sequences in the promoter region. The database is currently updated as of July 2000 and includes 471 entries
ProTISA   ProTISA contains SD-like, TA-like and atypical
signals for many organism.
Transcriptional regulation by sequence-specific transcription factors (TFs) is mediated, in significant part, by the post-translational modifications (PTMs) of the TFs. PTMs serve as molecular switchboards that map upstream signaling events to the downstream transcriptional events.

PTM-Switchboard is designed to catalog known cases of TF-PTMs affecting gene transcriptions. The current version 1.0 is limited to the model organism S. cerevisiae (budding yeast). PTM-Switchboard differs from existing molecular pathway databases in that, instead of using pair-wise interactions as a primary data type, it stores triplets of genes such that the ability of one gene (the TF) to regulate a target gene is dependent on a third gene (the modifying enzyme). We refer to this as the Modifier-Transcription Factor-Gene triplet, or in short, the MFG triplet. The database is currently populated with experimentally characterized examples of MFG triplets manually curated from the literature. In addition to providing a framework for searching and analyzing the data, the database can also serve to benchmark computational methods for identifying novel MFG triplets. In the future, the database will be expanded to mammalian organisms, and will also include triplets predicted from text-mining and model-based computational approaches.
QuadBase G-quadruplex motifs in the promoters of human, chimpanzee, rat, mouse and bacterial genes
RedFly REDflyis a curated collection of known Drosophila transcriptional cis-regulatory modules (CRMs) and transcription factor binding sites (TFBSs).
RegulonDB Eukaryotic gene regulatory elements
rSNP_DB Database on magnitudes characterizing the influence of single nucleotide mutations in regulatory gene regions onto their interaction with nuclear proteins
SCPD   A database of yeast promoters
SELEX_DB SELEX_DB: a database on selected randomized DNA/RNA sequences
SELEX_BIB: a database on annotated in SELEX_DB papers
SKY/M-FISH and CGH   The NCI and NCBI SKY/M-FISH and CGH Database is a repository of publicly submitted data from Spectral Karyotyping (SKY), Multiplex Fluorescence In Situ Hybridization (M-FISH), and Comparative Genomic Hybridization (CGH), which are complementary fluorescent molecular cytogenetic techniques. SKY/M-FISH permits the simultaneous visualization of each human or mouse chromosome in a different color, facilitating the identification of chromosomal aberrations; CGH can be used to generate a map of DNA copy number changes in tumor genomes. Collaborative project with the National Cancer Institute.
SwissRegulon   SwissRegulon ( is a database containing genome-wide annotations of regulatory sites in the intergenic regions of genomes. The regulatory site annotations are produced using a number of recently developed algorithms that operate on multiple alignments of orthologous intergenic regions from related genomes in combination with, whenever available, known sites from the literature, and ChIP-on-chip binding data. Currently SwissRegulon contains annotations for yeast and 17 prokaryotic genomes. The database provides information about the sequence, location, orientation, posterior probability and, whenever available, binding factor for each annotated site. To enable easy viewing of the regulatory site annotations in the context of other features annotated on the genomes, the sites are displayed using the GBrowse genome browser interface and can be queried based on any annotated genomic feature. The database can also be queried for regulons, i.e. sites bound by a common factor.
Taxon ID    
Telomerase database Sequences and structures of the RNA and protein subunits of telomerase, mutations of telomerase components
TESS TESS (Transcription Element Search System, is a web-based service that searches DNA sequence for transcription factor binding sites. It integrates three databases of transcription factors and binding site models, and provides browsing and querying capability for the databases, sequence searching, and accuracy data for the positional weight matrix (PWM) models.
TiProD TiProD is a database of human promoter sequences for which some functional features are known. It allows a user to query individual promoters and the expression pattern they mediate, gene expression signatures of individual tissues, and to retrieve sets of promoters according to their tissue-specific activity or according to individual GO-terms the corresponding genes are assigned to.
TRACTOR db   Experimental data on the Escherichia coli transcriptional regulatory system has been used in the past years to predict new regulatory elements (promoters, Transcription Factors (TFs), TFs' binding sites, operons) within its genome. As more genomes of gamma-proteobacteria are being sequenced, the prediction of these elements in a growing number of organisms has become more feasible, as a step towards the study of how different bacteria respond to environmental changes at the level of transcriptional regulation. TRACTOR_DB (TRAnscription FaCTORsy predicted sites in prokaryotic genomes), is a relational database that contains computational predictions of new members of 74 regulons in eight gamma-proteobacterial genomes.
The TRANSCompel database is devoted to the particular aspect of gene transcriptional regulation [1-7]. It contains information about composite elements - the basic structures of combinatorial gene regulation [7]. Composite regulatory elements consist of two or three closely situated binding sites for distinct transcription factors (TFs), and represent minimal functional units providing combinatorial transcriptional regulation. Both specific factor-DNA and factor-factor interactions contribute to the function of composite elements. Each database entry corresponds to an individual composite element within a particular gene and contains information about two or three binding sites, the corresponding TFs, experiments confirming cooperative action between TFs. Interacting factors may differ by the structure of DNA-binding, activation, oligomerization and other domains. Along with structural differences, functional properties of TFs and hence their specific contribution to the transcription regulation may significantly vary. Co-operative action of the TFs within the composite elements results in a new highly specific pattern of gene transcription that cannot be provided by the involved factors separately. Composite elements are structural-functional units that provide cross-coupling of gene regulatory pathways and, in particular, cross-coupling of signal transduction pathways. There are two main types of composite elements: synergistic and antagonistic ones. In synergistic CEs, simultaneous interactions of two factors with closely situated target sites results in a non-additive high level of a transcriptional activation. Within an antagonistic CE two factors interfere with each other. Information about the structure of known composite elements and specific gene regulation achieved through such composite elements appears to be extremely useful for promoter prediction, for gene function prediction and for applied gene engineering as well.
The SITE table contains information on individual (putatively) regulatory protein binding sites. In this release, it contains 7915 entries, 6360 of them referring to sites within 1504 eukaryotic genes, the species of which ranging from yeast to human. Additionally, this table comprises 1295 artificial sequences which resulted from mutagenesis studies, in vitro selection procedures starting from random oligonucleotide mixtures or from specific theoretical considerations.

The FACTOR table contains 6133 entries, but this figure does not reflect the number of independent transcription factors. First of all, homologous factors from different species such as human and mouse SRF are given in different entries since they may differ in some molecular aspects.

The Cell table gives short explanations for the 1307 cellular sources of proteins that interact with the sites listed in the SITE table. Among them may be defined cell lines, tissues / organs, even whole organisms, or recombinant expression systems.

The Class table explains some of the main features of the DNA- binding domains of 50 transcription factor classes. In those cases where an amino acid consensus motif has been identified, the corresponding accession number of the PROSITE database (A. Bairoch) is included.

The MATRIX table contains 398 nucleotide distribution matrices of aligned binding sequences. These sequences may have been obtained by in vitro selection studies or may be compiled sites of genes.

The Gene table Initially designed to link TRANSFAC data to the TRRD and Compel (now: TRANSCompelTM) databases, the GENE table (2397 entries) has gained more and more a central role, as it is not only jointly used by several of our own databases (in addition to TRANSCompelTM: TRANSPATH, PathoDB and S/MARt DBTM), but has been extended to one of the major link sources to other, external databases (EMBL, BRENDA, LocusLink, OMIM, RefSeq)
TransfactomeDB   Nucleotide sequence specificity and condition-specific regulatory activity of trans-acting factors
MOLECULE table describes proteins and other components that transduce extracellular signals to target genes. 23384

REACTION table gives information on interactions between signaling molecules that constitute regulatory pathways and networks. 32752

GENE table contains information on target genes and gene expressions as starting points for regulatory pathways or feedback loops. 10961

REFERENCE table contains references extracted for entries (molecule, reaction, etc.) with links to PubMed. 9531
Transterm is a database that facilitates studies of translation and the translational control of protein synthesis. It contains a curated collection of motifs in mRNAs that control translation, and also biologically relevant mRNA regions extracted from GenBank. It is organised largely on a taxonomic basis with files and summaries for each species. Global patterns that may affect translation in particular species for example bias in the context of initiation codons (Kozaks consensus, or Shine Dalgarno sequences) or termination codons can be detected in the summary consensus and information content biases summaries. Several types of access are provided via a web browser interface. Transterm defined motifs may be matched in a users sequence or in the database. Alternatively, motifs can be entered by the user to search specific sections of the database (for example coding regions or 3' flanking regions) or the user's sequence. Each Transterm defined motif has an associated biological description with references.
TRED   Transcriptional regulatory element database
TRRD   Transcription Regulatory Regions Database (TRRD) is an informational resource containing an integrated description of the gene transcription regulation. Each TRRD entry corresponds to a particular gene and contains descriptions of several hierarchical levels of transcription regulation, including (1) transcription factor binding sites, (2) regulatory units (promoters, enhancers, and silencers), (3) regulatory regions (5'-and 3'- regulatory regions, exons, and introns), and (4) locus control regions. Description of each regulatory level may contain both its structural characteristics (sequence, localization, etc.) and functional properties (effect on transcription activity of a gene; cell type, tissue, or organ specificity; cell-cycle phase or ontogenetic stage specificity, etc.). In the database, the data on LCR, regulatory regions, regulatory units, and transcription factor binding sites are supplemented with (1) descriptions of transcription factors used in the experiments on either binding capacities or functional activities of the corresponding sites, (2) patterns of gene expression regulation, and (3) references to original publications. All the data are distributed between seven following databases: TRRDGENES (general gene description), TRRDLCR (locus control regions); TRRDUNITS (regulatory regions: promoters, enhancers, silencers, etc.), TRRDSITES (transcription factor binding sites), TRRDFACTORS (transcription factors), TRRDEXP (expression patterns), and TRRDBIB (experimental publications). TRRD is regularly updated in accordance with new experimental evidence. All the information is inputted into the database by experts in biology basing on analysis and annotation of papers reporting experimental data. Each type of experiment is designated with specific digital code, indicated in the fields 'ExperimentCodes' of the databases TRRDGENES, TRRDSITES, and TRRDUNITS. Sequence Retrieval System (SRS) is used as a basic tool for navigating and searching TRRD and integrating it with external informational and software resources.
TrSDB   Transcription factor database
UCSF localization   UCSF localization data (Huh, W.K. et al. (2003), "Global analysis of protein localization in budding yeast". Nature 425,686-91.)
UniPROBE   Universal Protein binding microarray Resource for Oligonucleotide Binding Evaluation
VISTA Enhancer Browser   Despite the known existence of distant-acting cis-regulatory elements in the human genome, only a small fraction of these elements has been identified and experimentally characterized in vivo. This paucity of enhancer collections with defined activities has thus hindered computational approaches for the genome-wide prediction of enhancers and their functions. To fill this void, we utilize comparative genome analysis to identify candidate enhancer elements in the human genome coupled with the experimental determination of their in vivo enhancer activity in transgenic mice. These data are available through the VISTA Enhancer Browser ( This growing database currently contains over 250 experimentally tested DNA fragments, of which more than 100 have been validated as tissue-specific enhancers. For each positive enhancer, we provide digital images of whole-mount embryo staining at embryonic day 11.5 and an anatomical description of the reporter gene expression pattern. Users can retrieve elements near single genes of interest, search for enhancers that target reporter gene expression to a particular tissue, or download entire collections of enhancers with a defined tissue specificity or conservation depth. These experimentally validated training sets are expected to provide a basis for a wide range of downstream computational and functional studies of enhancer function.
YEASTRACT   The YEASTRACT (Yeast Search for Transcriptional Regulators And Consensus Tracking; database is a tool for the analysis of transcription regulatory associations in Saccharomyces cerevisiae. This database is a repository of more than 12000 regulatory associations between transcription factors and target genes, based on experimental evidence which were spread throughout more than 850 bibliographic references. It also includes more than 250 specific DNA binding sites for more than a hundred characterized transcription factors. Further information about each yeast gene included in the database was obtained from Saccharomyces Genome Database (SGD), Regulatory Sequences Analysis Tools (RSAT) and Gene Ontology (GO) Consortium. Computational tools are also provided to facilitate the exploitation of the gathered data when solving a number of biological questions as exemplified in the Tutorial also available on the system. YEASTRACT allows the identification of documented or potential transcription regulators of a given gene and of documented or potential regulons for each transcription factor. It also renders possible the comparison between DNA motifs, such as those found to be over-represented in the promoter regions of co-regulated genes, and the transcription factor binding sites described in the literature. The system also provides an useful mechanism for grouping a list of genes (for instance a set of genes with similar expression profiles as revealed by microarray analysis) based on their regulatory associations with known transcription factors.
Ito2001   Yeast-two hybrid data (Ito, T. et al. (2001) "A comprehensive two-hybrid analysis to explore the yeast protein interactome". Proc Natl Acad Sci U S A. 98, 4569-74 . )
Uetz2000   Yeast-two hybrid data (Uetz, P. et al. (2000) "A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae". Nature 403,623-7.)