Parameters for motif searches
Sequence motifs were extracted from the entire robust FANTOM5 CAGE-TSS (Forrest et al. 2014, Nature) or bidirectional CAGE enhancer datasets (Anderson et al. 2014, Nature) or tissue- or cell type-specific enhancer facets (as descibed in Anderson et al. 2014, Nature) using
HOMER (version 3), a suite of tools for motif discovery and next-gen sequencing analysis developed by Christopher Benner (Integrative Genomics Core, Salk Institute, San Diego).
The programs findMotifGenome.pl, compareMotifs.pl and findKnownMotifs were slightly modified to generate a non-standard html output, add sample information and create additional options for thresholding. Known motif enrichment was determined using all available HOMER-derived vertebrate PWM and standard parameters.
Parameters for de novo searches in complete enhancer/promoter sets
- region size: -size given (for promoters); -size 400 (for enhancers)
- genome: hg19r (repeatmasked)
- motif lenghth: -len 7,8,9,10,11,12,13,14
- p-value threshold: -pvalue 1e-15
- motif information content : -info 1.5
- matrix similarity threshold: -reduceThresh 0.75
- minimum motif occurence in targets: -minT 50 (non-standard option)
- maximum fraction of background regions with motif: -B 0.3 (non-standard option)
Parameters for de novo searches in cell type- or tissue-specific enhancer sets
- region size: -size given (for promoters); -size 400 (for enhancers)
- genome: hg19r (repeatmasked)
- motif lenghth: -len 7,8,9,10,11,12,13,14
- p-value threshold: -pvalue 1e-15
- motif information content : -info 1.5
- matrix similarity threshold: -reduceThresh 0.75
- minimum fraction of targets with motif: -minpT 0.03 (non-standard option)
Known Motif PWMs used to match de novo-derived motifs