This function is to calculate the CPM deconvolution proportions. IMPORTANT: No model is needed. Everything is done inside this method. This method is NOT deterministic, so if it is run multiple times, it will create different outputs.

deconvolute_cpm(
  bulk_gene_expression,
  single_cell_object,
  cell_type_annotations,
  cell_space = "PCA",
  no_cores = NULL,
  neighborhood_size = 10,
  model_size = 50,
  min_selection = 5,
  calculate_CI = FALSE,
  verbose = FALSE
)

Arguments

bulk_gene_expression

A matrix of bulk data. Rows are genes, columns are samples. Row and column names need to be set.

single_cell_object

A matrix with the single-cell data. Rows are genes, columns are samples. Row and column names need to be set.

cell_type_annotations

A vector of the cell type annotations. Has to be in the same order as the samples in single_cell_object.

cell_space

The cell state space corresponding to the single-cell RNA-seq data. It can be a vector for a 1-dim space or a 2D matrix for a two space where each column represents a different dimension. The cell space should incorporate the similarities of cells within cell types. Similarities between cells from different cell types, based on the cell space, are not taken into account in CPM. It is also possible to supply the string "PCA", "UMAP" or "TSNE" which calculates the cell space using the corresponding method (using the Seurat implementation and default parameters).

no_cores

A number for the amount of cores which will be used for the analysis. The default (NULL) is total number of cores minus 1.

neighborhood_size

Cell neighborhood size which will be used for the analysis. This should be lower than the number of cells in the smallest cell type. The default is 10.

model_size

The reference subset size in each iteration of CPM. This should be lower than the total number of cells. The default is 50.

min_selection

The minimum number of times in which each reference cell is selected. Increasing this value might have a large effect on the algorithm's running time. The default is 5.

calculate_CI

A boolean parameter indicating whether the calculation of confidence intervals is needed. The default is FALSE.

verbose

Whether to produce an output on the console.

Value

A list including:

predicted

CPM predicted cell abundance matrix. Each row represents a sample and each column a single cell.

cellTypePredictions

CPM predicted cell-type abundance matrix. Each row represents a sample and each column a single cell-type.

confIntervals

A matrix containing the confidence interval for each cell and sample. Each row represents a sample and each column a single cell. This is calculated if calculate_CI = TRUE.

numOfRuns

The number of deconvolution repeats preformed by CPM.

Details

This function initiate the Cellular Population Mapping (CPM) algorithm - a deconvolution algorithm in which single-cell genomics is required in only one or a few samples, where in other samples of the same tissue, only bulk genomics is measured and the underlying fine resolution cellular heterogeneity is inferred. CPM predicts the abundance of cells (and cell types) ranging monotonically from negative to positive levels. Using a relative framework these values correspond to decrease and increase in cell abundance levels, respectively. On the other hand, in an absolute framework lower values (including negatives) correspond to lower abundances and vise versa. These values are comparable between samples.