SCDC Deconvolution — deconvolute

This function is to calculate the SCDC deconvolution proportions. IMPORTANT: No model is needed. Everything is done inside this method.

deconvolute_scdc(
  bulk_gene_expression,
  single_cell_object,
  cell_type_annotations,
  batch_ids,
  ct_varname = "cellType",
  sample = "batchId",
  ct_sub = NULL,
  iter_max = NULL,
  nu = 1e-04,
  epsilon = NULL,
  truep = NULL,
  weight_basis = TRUE,
  ct_cell_size = NULL,
  transform_bisque = FALSE,
  grid_search = FALSE,
  search_length = 0.05,
  names_sc_objects = NULL,
  qcthreshold = 0.7,
  verbose = FALSE,
  quality_control = FALSE
)

Arguments

bulk_gene_expression: A matrix of bulk data. Rows are genes, columns are samples. Row and column names need to be set.
single_cell_object: A matrix or dataframe with the single-cell data. Rows are genes, columns are samples. Row and column names need to be set. This can also be a list of objects, if SCDC_ENSEMBLE should be used.
cell_type_annotations: A Vector of the cell type annotations. Has to be in the same order as the samples in single_cell_object. This can also be a list of vectors, if SCDC_ENSEMBLE should be used.
batch_ids: A vector of the ids of the samples or individuals.
ct_varname: character string specifying the variable name for 'cell types'.
sample: character string specifying the variable name for subject/samples.
ct_sub: vector. a subset of cell types that are selected to construct basis matrix. NULL means that all are used.
iter_max: the maximum number of iteration in WNNLS. If the parameter is NULL, the default value of 1000 for a single single cell object and 2000 for a list is chosen.
nu: a small constant to facilitate the calculation of variance.
epsilon: a small constant number used for convergence criteria. If the parameter is NULL, the default value of 0.01 for a single single cell object and 0.001 for a list is chosen.
truep: true cell-type proportions for bulk samples if known.
weight_basis: Whether to use the Basis Matrix adjusted for maximal variance weight, created by the SCDC_basis function.
ct_cell_size: default is NULL, which means the "library size" is calculated based on the data. Users can specify a vector of cell size factors corresponding to the ct.sub according to prior knowledge. The vector should be named: names(ct_cell_size input) should not be NULL.
transform_bisque: The bulk sample transformation from bisqueRNA. Aiming to reduce the systematic difference between single cells and bulk samples.
grid_search: logical. whether to allow grid search method to derive the ENSEMBLE weights.
search_length: a number between 0 to 0.5. if using "Grid search", the step length used. Smaller search.length derives more accurate optimization results.
names_sc_objects: A vector with the names of the single cell objects. Only used if a list of single cell objects is supplied. If it remains NULL, the objects are named by their index.
qcthreshold: The probability threshold used to filter out questionable cells, only used if quality_control = TRUE.
verbose: Whether to produce an output on the console.
quality_control: Whether to perform the SCDC_qc quality control method.

Value

Depends on whether one or multiple single cell sets are used.

One:

prop.est.mvw: A matrix of cell type proportion estimates with cell types as rows and individuals as columns.
basis.mvw: The signature matrix. Rows are genes, columns are cell types.
yhat: The predicted gene expression levels for the bulk samples.
yeval: The evaluation of the predicted gene expression levels.
peval: The evaluation of the deconvoluted proportions. Since we dont have a ground truth, this is always NULL.

ENSEMBLE:

w_table: A matrix with the suggested weights for each single cell dataset and some statistical evaluation.
prop.list: A list of the "One:" outputs as seen above for each single cell dataset.
prop.only: A list of the prop.est.mvw values for each single cell dataset.
gridres: A matrix with the results of the gridsearch. NULL if grid_search = FALSE.

Details

SCDC_ENSEMBLE can be used by supplying lists to the parameters single_cell_object and cell_type_annotations. To name the single cell data sets, supply a vector with their corresponding names to names_sc_objects.

Requires raw read counts. Works best with multiple cells per single cell patient/subject