This function is to calculate the SCDC deconvolution proportions. IMPORTANT: No model is needed. Everything is done inside this method.

deconvolute_scdc(
  bulk_gene_expression,
  single_cell_object,
  cell_type_annotations,
  batch_ids,
  ct_varname = "cellType",
  sample = "batchId",
  ct_sub = NULL,
  iter_max = NULL,
  nu = 1e-04,
  epsilon = NULL,
  truep = NULL,
  weight_basis = TRUE,
  ct_cell_size = NULL,
  transform_bisque = FALSE,
  grid_search = FALSE,
  search_length = 0.05,
  names_sc_objects = NULL,
  qcthreshold = 0.7,
  verbose = FALSE,
  quality_control = FALSE
)

Arguments

bulk_gene_expression

A matrix of bulk data. Rows are genes, columns are samples. Row and column names need to be set.

single_cell_object

A matrix or dataframe with the single-cell data. Rows are genes, columns are samples. Row and column names need to be set. This can also be a list of objects, if SCDC_ENSEMBLE should be used.

cell_type_annotations

A Vector of the cell type annotations. Has to be in the same order as the samples in single_cell_object. This can also be a list of vectors, if SCDC_ENSEMBLE should be used.

batch_ids

A vector of the ids of the samples or individuals.

ct_varname

character string specifying the variable name for 'cell types'.

sample

character string specifying the variable name for subject/samples.

ct_sub

vector. a subset of cell types that are selected to construct basis matrix. NULL means that all are used.

iter_max

the maximum number of iteration in WNNLS. If the parameter is NULL, the default value of 1000 for a single single cell object and 2000 for a list is chosen.

nu

a small constant to facilitate the calculation of variance.

epsilon

a small constant number used for convergence criteria. If the parameter is NULL, the default value of 0.01 for a single single cell object and 0.001 for a list is chosen.

truep

true cell-type proportions for bulk samples if known.

weight_basis

Whether to use the Basis Matrix adjusted for maximal variance weight, created by the SCDC_basis function.

ct_cell_size

default is NULL, which means the "library size" is calculated based on the data. Users can specify a vector of cell size factors corresponding to the ct.sub according to prior knowledge. The vector should be named: names(ct_cell_size input) should not be NULL.

transform_bisque

The bulk sample transformation from bisqueRNA. Aiming to reduce the systematic difference between single cells and bulk samples.

grid_search

logical. whether to allow grid search method to derive the ENSEMBLE weights.

search_length

a number between 0 to 0.5. if using "Grid search", the step length used. Smaller search.length derives more accurate optimization results.

names_sc_objects

A vector with the names of the single cell objects. Only used if a list of single cell objects is supplied. If it remains NULL, the objects are named by their index.

qcthreshold

The probability threshold used to filter out questionable cells, only used if quality_control = TRUE.

verbose

Whether to produce an output on the console.

quality_control

Whether to perform the SCDC_qc quality control method.

Value

Depends on whether one or multiple single cell sets are used.

One:

prop.est.mvw

A matrix of cell type proportion estimates with cell types as rows and individuals as columns.

basis.mvw

The signature matrix. Rows are genes, columns are cell types.

yhat

The predicted gene expression levels for the bulk samples.

yeval

The evaluation of the predicted gene expression levels.

peval

The evaluation of the deconvoluted proportions. Since we dont have a ground truth, this is always NULL.

ENSEMBLE:

w_table

A matrix with the suggested weights for each single cell dataset and some statistical evaluation.

prop.list

A list of the "One:" outputs as seen above for each single cell dataset.

prop.only

A list of the prop.est.mvw values for each single cell dataset.

gridres

A matrix with the results of the gridsearch. NULL if grid_search = FALSE.

Details

SCDC_ENSEMBLE can be used by supplying lists to the parameters single_cell_object and cell_type_annotations. To name the single cell data sets, supply a vector with their corresponding names to names_sc_objects.

Requires raw read counts. Works best with multiple cells per single cell patient/subject