Export pseudobulk bed files as input for MACS, then run MACS and read the output peaks as a tibble. Each step can can be run independently, allowing for quickly re-loading the results of an already completed call, or running MACS externally (e.g. via cluster job submisison) for increased parallelization. See details for more information.
Usage
call_peaks_macs(
fragments,
path,
cell_groups = rlang::rep_along(cellNames(fragments), "all"),
effective_genome_size = 2.9e+09,
insertion_mode = c("both", "start_only", "end_only"),
step = c("all", "prep-inputs", "run-macs", "read-outputs"),
macs_executable = NULL,
additional_params =
"--call-summits --keep-dup all --shift -75 --extsize 150 --nomodel --nolambda",
verbose = FALSE,
threads = 1
)Arguments
- fragments
IterableFragments object
- path
(string) Parent directory to store MACS inputs and outputs. Inputs are stored in
<path>/input/and outputs in<path>/output/<group>/. See "File format" in details- cell_groups
Grouping vector with one entry per cell in fragments, e.g. cluster IDs
- effective_genome_size
(numeric) Effective genome size for MACS. Default is
2.9e9following MACS default for GRCh38. See deeptools for values for other common genomes.- insertion_mode
(string) Which fragment ends to use for coverage calculation. One of
both,start_only, orend_only.- step
(string) Which step to run. One of
all,prep-inputs,run-macs,read-outputs. Ifprep-inputs, create the input bed files for macs, and provides a shell script per cell group with the command to run macs. Ifrun-macs, also run bash scripts to execute macs. Ifread-outputs, read the outputs into tibbles.- macs_executable
(string) Path to either MACS2/3 executable. Default (
NULL) will autodetect from PATH.- additional_params
(string) Additional parameters to pass to MACS2/3.
- verbose
(bool) Whether to provide verbose output from MACS. Only used if step is
run-macsorall.- threads
(int) Number of threads to use.
Value
If step is
prep-inputs, return script paths for each cell group given as a character vector.If step is
run-macs, returnNULL.If step is
read-outputsorall, returns a tibble with all the peaks from each cell group concatenated. Columnns arechr,start,end,group,name,score,strand,fold_enrichment,log10_pvalue,log10_qvalue,summit_offset
Details
File format:
Inputs are written such that a bed file used as input into MACS, as well as a shell file containing a call to MACS are written for each cell group.
Bed files containing
chr,start, andendcoordinates of insertions are written at<path>/input/<group>.bed.gz.Shell commands to run MACS manually are written at
<path>/input/<group>.sh.
Outputs are written to an output directory with a subdirectory for each cell group. Each cell group's output directory contains a file for narrowPeaks, peaks, and summits.
NarrowPeaks are written at
<path>/output/<group>/<group>_peaks.narrowPeak.Peaks are written at
<path>/output/<group>/<group>_peaks.xls.Summits are written at
<path>/output/<group>/<group>_summits.bed.
Only the narrowPeaks file is read into a tibble and returned. For more information on outputs from MACS, visit the MACS docs
Performance:
Running on a 2600 cell dataset and taking both start and end insertions into account, written input bedfiles and MACS outputs used 364 MB and 158 MB of space respectively. With 4 threads, running this function end to end took 74 seconds, with 61 of those seconds spent on running MACS.
Running MACS manually:
To run MACS manually, you will first run call_peaks_macs() with step="prep-inputs. Then, manually run all of the
shell scripts generated at <path>/input/<group>.sh. Finally, run call_peaks_macs() again with the same original arguments, but
setting step="read-outputs".
