TEP

Accurate transcription start sites enable mining for the cis-regulatory determinants of tissue specific gene expression


Publication Online: https://www.biorxiv.org/content/10.1101/2020.09.01.278424

About this study: 
Identifying the genomic regulatory information that controls endogenous gene expression in different tissues of multicellular organisms remains a grand open challenge.  Current state-of-the-art models have not yet been able to capture genomic sequence regions or features that lead to successful prediction of the tissue(s) in which a gene will express in any organism.  It therefore remains an open question as to whether genome sequence content, chromatin state, or a combination of these two potentially critical types of regulatory information are largely responsible for determining a gene's tissue(s) of expression.  In plants, the ability to efficiently assay genome-wide transcriptional state, including start site location and chromatin accessibility, in healthy tissues of the same individuals provides a unique opportunity to address this challenge.  In this study, we show that Transcription Start Site sequencing data from the roots and shoots of Arabidopsis plants allows high-resolution modeling of the tissue of expression from sequence and chromatin accessibility data, leading to successful prediction of tissue for highly differentially expressed genes (auROC and auPRC over 90%).  We find that sequence content including Transcription Factor Binding Sites in the 1kb upstream TSS-proximal region contributes the vast majority of this predictive power, yielding insights for synthetic promoter design.


Software

Tissue of Expression Predictor (TEP)

All Tissue of Expression Prediction models, including the Regions of Enrichment model (TEP-ROE), Tiled model (TEP-Tiled), and other model analyses are provided with source code and documentation here.

Funding Citation

This work supported was by the National Science Foundation under Award No.1750698.

Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).