<?xml version="1.0" encoding="UTF-8"?><xml><records><record><source-app name="Biblio" version="7.x">Drupal-Biblio</source-app><ref-type>17</ref-type><contributors><authors><author><style face="normal" font="default" size="100%">Megraw, Molly</style></author><author><style face="normal" font="default" size="100%">Pereira, Fernando</style></author><author><style face="normal" font="default" size="100%">Jensen, Shane T</style></author><author><style face="normal" font="default" size="100%">Ohler, Uwe</style></author><author><style face="normal" font="default" size="100%">Hatzigeorgiou, Artemis G</style></author></authors></contributors><titles><title><style face="normal" font="default" size="100%">A transcription factor affinity-based code for mammalian transcription initiation.</style></title><secondary-title><style face="normal" font="default" size="100%">Genome Res</style></secondary-title><alt-title><style face="normal" font="default" size="100%">Genome Res.</style></alt-title></titles><keywords><keyword><style  face="normal" font="default" size="100%">Base Composition</style></keyword><keyword><style  face="normal" font="default" size="100%">Databases, Genetic</style></keyword><keyword><style  face="normal" font="default" size="100%">DNA</style></keyword><keyword><style  face="normal" font="default" size="100%">Gene Expression Regulation</style></keyword><keyword><style  face="normal" font="default" size="100%">Genome, Human</style></keyword><keyword><style  face="normal" font="default" size="100%">Humans</style></keyword><keyword><style  face="normal" font="default" size="100%">Promoter Regions, Genetic</style></keyword><keyword><style  face="normal" font="default" size="100%">RNA Polymerase II</style></keyword><keyword><style  face="normal" font="default" size="100%">TATA Box</style></keyword><keyword><style  face="normal" font="default" size="100%">Transcription Factors</style></keyword><keyword><style  face="normal" font="default" size="100%">Transcription Initiation Site</style></keyword><keyword><style  face="normal" font="default" size="100%">Transcription, Genetic</style></keyword></keywords><dates><year><style  face="normal" font="default" size="100%">2009</style></year><pub-dates><date><style  face="normal" font="default" size="100%">2009 Apr</style></date></pub-dates></dates><volume><style face="normal" font="default" size="100%">19</style></volume><pages><style face="normal" font="default" size="100%">644-56</style></pages><language><style face="normal" font="default" size="100%">eng</style></language><abstract><style face="normal" font="default" size="100%">&lt;p&gt;The recent arrival of large-scale cap analysis of gene expression (CAGE) data sets in mammals provides a wealth of quantitative information on coding and noncoding RNA polymerase II transcription start sites (TSS). Genome-wide CAGE studies reveal that a large fraction of TSS exhibit peaks where the vast majority of associated tags map to a particular location ( approximately 45%), whereas other active regions contain a broader distribution of initiation events. The presence of a strong single peak suggests that transcription at these locations may be mediated by position-specific sequence features. We therefore propose a new model for single-peaked TSS based solely on known transcription factors (TFs) and their respective regions of positional enrichment. This probabilistic model leads to near-perfect classification results in cross-validation (auROC = 0.98), and performance in genomic scans demonstrates that TSS prediction with both high accuracy and spatial resolution is achievable for a specific but large subgroup of mammalian promoters. The interpretable model structure suggests a DNA code in which canonical sequence features such as TATA-box, Initiator, and GC content do play a significant role, but many additional TFs show distinct spatial biases with respect to TSS location and are important contributors to the accurate prediction of single-peak transcription initiation sites. The model structure also reveals that CAGE tag clusters distal from annotated gene starts have distinct characteristics compared to those close to gene 5&amp;#39;-ends. Using this high-resolution single-peak model, we predict TSS for approximately 70% of mammalian microRNAs based on currently available data.&lt;/p&gt;&lt;p&gt;&lt;a href=&quot;http://megraw-dev.cgrb.oregonstate.edu/node/715&quot;&gt;[Links to Tools and Supplementary Materials]&lt;/a&gt;&lt;/p&gt;</style></abstract><issue><style face="normal" font="default" size="100%">4</style></issue></record></records></xml>