Apply a custom R function to each row/col of a BPCells matrix. This will run slower than the builtin C++-backed functions, but will keep most of the memory benefits from disk-backed operations.
Arguments
- mat
IterableMatrix object
- fun
function(val, row, col)
that takes in a row/col of values and returns a summary output. Argument details:val
- Vector length (# non-zero values) with the value for each non-zero matrix entryrow
- one-based row index (apply_by_col
: vector length (# non-zero values),apply_by_row
: single integer)col
- one-based col index (apply_by_col
: single integer,apply_by_row
: vector length (# non-zero values))...
- Optional additional arguments (should not be named row, col, or val)
- ...
Optional additional arguments passed to
fun
Value
apply_by_row - A list of length nrow(matrix)
with the results returned by fun()
on each row
apply_by_col - A list of length ncol(matrix)
with the results returned by fun()
on each row
Details
These functions require row-major matrix storage for apply_by_row and col-major storage for apply_by_col,
so matrices stored in the wrong order may neeed a re-ordered copy created using transpose_storage_order()
first.
This is required to be able to keep memory-usage low and allow calculating the result with a single streaming pass of the
input matrix.
If vector/matrix outputs are desired instead of lists, calling unlist(x)
or do.call(cbind, x)
or do.call(rbind, x)
can convert the list output.
See also
For an interface more similar to base::apply
, see the BPCellsArray
project. For calculating colMeans on a sparse single cell RNA matrix it is about 8x slower than apply_by_col
, due to the
base::apply
interface not being sparsity-aware. (See pull request #104 for benchmarking.)