CapFilter: Analysis tool for identifying high confidence transcription start sites genome-wide
About this study:
Identifying the transcription start sites (TSS) of genes is essential for characterizing promoter regions, thereby elucidating the cis-regulatory elements that regulate gene expression. Several protocols have recently been developed to capture the 5' end of transcripts via cap-trapping (CAGE) or linker-ligation strategies (PEAT) to address TSS identification, but often require large amounts of tissue. More recently, nanoCAGE was developed for sequencing on the Illumina GAIIx to overcome these difficulties by utilizing the template switching properties of reverse transcriptase (RT). Challenges still remain in addressing sequencing depth for multiplexed samples and in identifying false TSSs derived from template switching artifacts introduced by the RT during sample preparation. Here we present nanoCAGE-2000, the first publicly available adaptation of the nanoCAGE protocol for the Illumina High-Seq 2000 platform. We developed CapFilter, a straightforward annotation-agnostic computational pipeline that greatly increases confidence in predicted TSSs and allows tuning by the end-user to balance both precision and sensitivity of TSS identification to address experimental needs. We provide an analysis of gene coverage and reproducibility, comparisons to previous work, and a software implementation for identification of high confidence TSSs. Together, nanoCAGE-2000 and CapFilter provide a practical method for achieving the quality and depth of coverage required for TSS analysis of eukaryotic genomes, including applications with limited tissue availability.
CapFilter Transcription Start Site Analysis Tool
- CapFilter_Software.tar.gz: CapFilter Command Line Tool, Usage Instructions. Requires samtools (http://samtools.sourceforge.net)
- Download and extract CapFilter software into directory:
tar xvzf CapFilter_Software.tar.gz
- Run the
run_test_example.shscript from the analysis tool:
Please see the LICENSE file included in the archive file for copyright and distribution rights.
Custom barcode splitter for nanoCAGE data on HiSeq-2000 files
- split_barcode.tar.gz: split_barcode.pl Command Line Tool, Usage Instructions.
- Download and extract split_barcode.pl software into directory:
tar xvzf split_barcode.tar.gz
- Run the
split_6barcode.shscript from the directory with the HiSeq files:
If any of these tools are used for work which results in a publication, we would appreciate citation of the following article:
Jason S. Cumbie, Maria G. Ivanchenko, and Molly Megraw. (2015). NanoCAGE-XL and CapFilter: an approach to genome wide identification of high confidence transcription start sites. BMC Genomics, 16:597. doi:10.1186/s12864-015-1670-6