Skip to contents

Export pseudobulk bed files as input for MACS, then run MACS and read the output peaks as a tibble. Each step can can be run independently, allowing for quickly re-loading the results of an already completed call, or running MACS externally (e.g. via cluster job submisison) for increased parallelization. See details for more information.

Usage

call_peaks_macs(
  fragments,
  path,
  cell_groups = rlang::rep_along(cellNames(fragments), "all"),
  effective_genome_size = 2.9e+09,
  insertion_mode = c("both", "start_only", "end_only"),
  step = c("all", "prep-inputs", "run-macs", "read-outputs"),
  macs_executable = NULL,
  additional_params =
    "--call-summits --keep-dup all --shift -75 --extsize 150 --nomodel --nolambda",
  verbose = FALSE,
  threads = 1
)

Arguments

fragments

IterableFragments object

path

(string) Parent directory to store MACS inputs and outputs. Inputs are stored in <path>/input/ and outputs in <path>/output/<group>/. See "File format" in details

cell_groups

Grouping vector with one entry per cell in fragments, e.g. cluster IDs

effective_genome_size

(numeric) Effective genome size for MACS. Default is 2.9e9 following MACS default for GRCh38. See deeptools for values for other common genomes.

insertion_mode

(string) Which fragment ends to use for coverage calculation. One of both, start_only, or end_only.

step

(string) Which step to run. One of all, prep-inputs, run-macs, read-outputs. If prep-inputs, create the input bed files for macs, and provides a shell script per cell group with the command to run macs. If run-macs, also run bash scripts to execute macs. If read-outputs, read the outputs into tibbles.

macs_executable

(string) Path to either MACS2/3 executable. Default (NULL) will autodetect from PATH.

additional_params

(string) Additional parameters to pass to MACS2/3.

verbose

(bool) Whether to provide verbose output from MACS. Only used if step is run-macs or all.

threads

(int) Number of threads to use.

Value

  • If step is prep-inputs, return script paths for each cell group given as a character vector.

  • If step is run-macs, return NULL.

  • If step is read-outputs or all, returns a tibble with all the peaks from each cell group concatenated. Columnns are chr, start, end, group, name, score, strand, fold_enrichment, log10_pvalue, log10_qvalue, summit_offset

Details

File format:

  • Inputs are written such that a bed file used as input into MACS, as well as a shell file containing a call to MACS are written for each cell group.

  • Bed files containing chr, start, and end coordinates of insertions are written at <path>/input/<group>.bed.gz.

  • Shell commands to run MACS manually are written at <path>/input/<group>.sh.

Outputs are written to an output directory with a subdirectory for each cell group. Each cell group's output directory contains a file for narrowPeaks, peaks, and summits.

  • NarrowPeaks are written at <path>/output/<group>/<group>_peaks.narrowPeak.

  • Peaks are written at <path>/output/<group>/<group>_peaks.xls.

  • Summits are written at <path>/output/<group>/<group>_summits.bed.

Only the narrowPeaks file is read into a tibble and returned. For more information on outputs from MACS, visit the MACS docs

Performance:

Running on a 2600 cell dataset and taking both start and end insertions into account, written input bedfiles and MACS outputs used 364 MB and 158 MB of space respectively. With 4 threads, running this function end to end took 74 seconds, with 61 of those seconds spent on running MACS.

Running MACS manually:

To run MACS manually, you will first run call_peaks_macs() with step="prep-inputs. Then, manually run all of the shell scripts generated at <path>/input/<group>.sh. Finally, run call_peaks_macs() again with the same original arguments, but setting step="read-outputs".