Title | A transcription factor affinity-based code for mammalian transcription initiation. |
Publication Type | Journal Article |
Year of Publication | 2009 |
Authors | Megraw, M, Pereira, F, Jensen, ST, Ohler, U, Hatzigeorgiou, AG |
Journal | Genome Res |
Volume | 19 |
Issue | 4 |
Pagination | 644-56 |
Date Published | 2009 Apr |
ISSN | 1088-9051 |
Keywords | Base Composition, Databases, Genetic, DNA, Gene Expression Regulation, Genome, Human, Humans, Promoter Regions, Genetic, RNA Polymerase II, TATA Box, Transcription Factors, Transcription Initiation Site, Transcription, Genetic |
Abstract | The recent arrival of large-scale cap analysis of gene expression (CAGE) data sets in mammals provides a wealth of quantitative information on coding and noncoding RNA polymerase II transcription start sites (TSS). Genome-wide CAGE studies reveal that a large fraction of TSS exhibit peaks where the vast majority of associated tags map to a particular location ( approximately 45%), whereas other active regions contain a broader distribution of initiation events. The presence of a strong single peak suggests that transcription at these locations may be mediated by position-specific sequence features. We therefore propose a new model for single-peaked TSS based solely on known transcription factors (TFs) and their respective regions of positional enrichment. This probabilistic model leads to near-perfect classification results in cross-validation (auROC = 0.98), and performance in genomic scans demonstrates that TSS prediction with both high accuracy and spatial resolution is achievable for a specific but large subgroup of mammalian promoters. The interpretable model structure suggests a DNA code in which canonical sequence features such as TATA-box, Initiator, and GC content do play a significant role, but many additional TFs show distinct spatial biases with respect to TSS location and are important contributors to the accurate prediction of single-peak transcription initiation sites. The model structure also reveals that CAGE tag clusters distal from annotated gene starts have distinct characteristics compared to those close to gene 5'-ends. Using this high-resolution single-peak model, we predict TSS for approximately 70% of mammalian microRNAs based on currently available data. |
DOI | 10.1101/gr.085449.108 |
Alternate Journal | Genome Res. |
PubMed ID | 19141595 |
PubMed Central ID | PMC2665783 |
Grant List | P50GM081883 / GM / NIGMS NIH HHS / United States R01 HG 004065 / HG / NHGRI NIH HHS / United States |
A transcription factor affinity-based code for mammalian transcription initiation.
Submitted by Megraw Lab Admin on Tue, 2016-07-12 22:13