Tissue of Expression Prediction Datasets

Accurate transcription start sites enable mining for the cis-regulatory determinants of tissue specific gene expression

Publication Online: coming soon 

About this study: 

Gene expression across tissues is regulated by an unknown number of determinants, including prevalence of transcription factors (TFs) and their binding sites along with other aspects of cellular state. Recent studies have emphasized the importance of both genetic and epigenetic states, at least two of which have substantial literature support as causal determinants of tissue specificity: TF binding sites, and chromatin accessibility at those sites. In order to investigate the extent and relative contributions of these potential determinants, we constructed three genome-scale datasets for both root and shoot tissues of the same Arabidopsis thaliana plants: TSS-seq data to identify Transcription Start Sites, OC-seq data to identify regions of Open Chromatin, and RNA-seq data to assess gene expression levels. For those genes that are differentially expressed between root and shoot, we constructed a machine learning model incorporating chromatin accessibility with TF binding information upstream of TSS locations, with the goal of predicting the tissue in which each of these genes would be upregulated. The resulting model was highly accurate when both chromatin structure and sequence were considered (over 90% auROC and auPRC), allowing one to predict the tissue in which a given gene will express. By considering model contributions that most strongly influence the predicted tissue of expression, our analysis suggests that TF site presence and location in ~500 nt TSS-proximal regions are predominant explainers of tissue of expression in a large majority of cases. Thus, in plants, cis-regulatory control of tissue specific gene expression appears unlikely to be limited to or even predominantly located in enhancer-like distal accessible chromatin regions. This study highlights the exciting future possibility of a native TF site-based design process for the tissue-specific targeting of plant gene promoters.

 

Online Data Access 

The following datasets are available through GBrowse

OC-Seq, TSS-Seq, and RNA-Seq reads were obtained from tissue of plants grown under 12hr light cycle and sequenced using Illumina HiSeq-2000, single-end reads with 51nt read length.   

Open Chromatin Analysis (DNase I SIM)

  • OC root: called peaks from OC-Seq data from root tissue.
  • OC shoot: called peaks from OC-Seq data from shoot tissue.
  • ROOT OC coverage: coverage data from OC-Seq data from root tissue.
  • SHOOT OC coverage: coverage data from OC-Seq data from shoot tissue.

 

TSS-Seq (nanoCAGE-XL)

  • ROOT TSS Reads: TSS-Seq (nanoCAGE-XL) reads from root tissue.
  • SHOOT TSS Reads: TSS-Seq (nanoCAGE-XL) reads from shoot tissue.
  • Root TSS nanoCAGE-XL Coverage: nanoCAGE-XL coverage of root TSSs.
  • Shoot TSS nanoCAGE-XL Coverage: nanoCAGE-XL coverage of shoot TSSs.
  • PEAT TSS-Seq Root Coverage: PEAT alignment coverage.

 

TSS-Seq peaks called by JAMM peaks caller on nanoCAGE-XL reads from root tissue

  • JAMM Peaks Ath Root f10_b10: peak-calling mode: normal, resolution: window, fragment size: 10, bin size: 10. Cap-filtered to remove spurious peaks that are not TSSs.
  • JAMM Peaks Ath Root f10_b10 capped: JAMM Peaks Ath Root f10_b10 without cap-filtering.
  • JAMM Peaks Ath Root f20 b10: peak-calling mode: normal, resolution: window, fragment size: 20, bin size: 10. Cap-filtered.
  • JAMM Peaks Ath Root f20 b20: peak-calling mode: normal, resolution: window, fragment size: 20, bin size: 20. Cap-filtered.
  • JAMM Peaks Ath Root f20_b20 capped: JAMM Peaks Ath Root f20_b20 without cap-filtering.
  • JAMM Peaks Ath Root f30: peak-calling mode: normal, resolution: window, fragment size: 30, bin size: automatic. Cap-filtered.

 

TSS-Seq peaks called by JAMM peaks caller on nanoCAGE-XL reads from shoot tissue

  • JAMM Peaks Ath Shoot f10_b10: peak-calling mode: normal, resolution: window, fragment size: 10, bin size: 10. Cap-filtered.
  • JAMM Peaks Ath Shoot f10_b10 capped: JAMM Peaks Ath Shoot f10_b10 without cap-filtering.
  • JAMM Peaks Ath Shoot f20 b10: peak-calling mode: normal, resolution: window, fragment size: 20, bin size: 10. Cap-filtered.
  • JAMM Peaks Ath Shoot f20 b20: peak-calling mode: normal, resolution: window, fragment size: 20, bin size: 20. Cap-filtered.
  • JAMM Peaks Ath Shoot f20_b20 capped: JAMM Peaks Ath Shoot f20 b20 without cap-filtering.
  • JAMM Peaks Ath Shoot f30: peak-calling mode: normal, resolution: window, fragment size: 30, bin size: automatic. Cap-filtered.

 

TSS-Seq peaks called by other peak callers

  • paraclu ROOT: peaks called using the program paraclu on nanoCAGE-XL reads from root tissue. Min. cluster size: 5 reads. The script paraclu-ct.sh was also used in processing this data to ensure the following conditions: Clusters are not single-position; Clusters span less than 200nt; Max density / baseline density > 2; Clusters are not part of a larger cluster.
  • paraclu SHOOT: peaks called using the program paraclu on nanoCAGE-XL reads from shoot tissue. Min. cluster size: 5 reads. The script paraclu-ct.sh was also used in processing this data to ensure the following conditions: Clusters are not single-position; Clusters span less than 200nt; Max density / baseline density > 2; Clusters are not part of a larger cluster.
  • Peak caller Ath Shoot: shoot tissue TSS-Seq peaks called using a program created by David Corcoran (unpublished).

 

RNA-Seq

  • (root1, root2, root3) RNASeq Coverage: RNA-Seq coverage from individual root tissue replicates.
  • (shoot1, shoot2, shoot3) RNASeq Coverage: RNA-Seq coverage from individual shoot tissue replicates.
  • root_merged RNASeq Coverage: this track combines root1 RNASeq Coverage, root2 RNASeq Coverage, and root3 RNASeq Coverage.
  • shoot_merged RNASeq Coverage: This track combines shoot1 RNASeq Coverage, shoot2 RNASeq Coverage, and shoot3 RNASeq Coverage.

 

Raw reads deposited in the NCBI SRA repository under the following accession numbers: