This function is to calculate the SCDC deconvolution proportions. IMPORTANT: No model is needed. Everything is done inside this method.
deconvolute_scdc(
bulk_gene_expression,
single_cell_object,
cell_type_annotations,
batch_ids,
ct_varname = "cellType",
sample = "batchId",
ct_sub = NULL,
iter_max = NULL,
nu = 1e-04,
epsilon = NULL,
truep = NULL,
weight_basis = TRUE,
ct_cell_size = NULL,
transform_bisque = FALSE,
grid_search = FALSE,
search_length = 0.05,
names_sc_objects = NULL,
qcthreshold = 0.7,
verbose = FALSE,
quality_control = FALSE
)
A matrix of bulk data. Rows are genes, columns are samples. Row and column names need to be set.
A matrix or dataframe with the single-cell data. Rows are genes, columns are samples. Row and column names need to be set. This can also be a list of objects, if SCDC_ENSEMBLE should be used.
A Vector of the cell type annotations. Has to be in the same order as the samples in single_cell_object. This can also be a list of vectors, if SCDC_ENSEMBLE should be used.
A vector of the ids of the samples or individuals.
character string specifying the variable name for 'cell types'.
character string specifying the variable name for subject/samples.
vector. a subset of cell types that are selected to construct basis matrix. NULL means that all are used.
the maximum number of iteration in WNNLS. If the parameter is NULL, the default value of 1000 for a single single cell object and 2000 for a list is chosen.
a small constant to facilitate the calculation of variance.
a small constant number used for convergence criteria. If the parameter is NULL, the default value of 0.01 for a single single cell object and 0.001 for a list is chosen.
true cell-type proportions for bulk samples if known.
Whether to use the Basis Matrix adjusted for maximal variance weight, created by the SCDC_basis function.
default is NULL, which means the "library size" is calculated based on the data. Users can specify a vector of cell size factors corresponding to the ct.sub according to prior knowledge. The vector should be named: names(ct_cell_size input) should not be NULL.
The bulk sample transformation from bisqueRNA. Aiming to reduce the systematic difference between single cells and bulk samples.
logical. whether to allow grid search method to derive the ENSEMBLE weights.
a number between 0 to 0.5. if using "Grid search", the step length used. Smaller search.length derives more accurate optimization results.
A vector with the names of the single cell objects. Only used if a list of single cell objects is supplied. If it remains NULL, the objects are named by their index.
The probability threshold used to filter out questionable cells, only used if quality_control = TRUE.
Whether to produce an output on the console.
Whether to perform the SCDC_qc quality control method.
Depends on whether one or multiple single cell sets are used.
One:
A matrix of cell type proportion estimates with cell types as rows and individuals as columns.
The signature matrix. Rows are genes, columns are cell types.
The predicted gene expression levels for the bulk samples.
The evaluation of the predicted gene expression levels.
The evaluation of the deconvoluted proportions. Since we dont have a ground truth, this is always NULL.
ENSEMBLE:
A matrix with the suggested weights for each single cell dataset and some statistical evaluation.
A list of the "One:" outputs as seen above for each single cell dataset.
A list of the prop.est.mvw values for each single cell dataset.
A matrix with the results of the gridsearch. NULL if grid_search = FALSE.
SCDC_ENSEMBLE can be used by supplying lists to the parameters single_cell_object and cell_type_annotations. To name the single cell data sets, supply a vector with their corresponding names to names_sc_objects.
Requires raw read counts. Works best with multiple cells per single cell patient/subject