Bayesian deconvolution module using BayesPrism — deconvolute

IMPORTANT: no model is needed. Everything is done inside this method.

deconvolute_bayesprism(
  bulk_gene_expression,
  single_cell_object,
  cell_type_annotations,
  cell_subtype_labels = NULL,
  tum_key = NULL,
  apply_bayes_prism_filtering = FALSE,
  species = "hs",
  exp.cells = 1,
  pseudo_min = 1e-08,
  gene_group = c("other_Rb", "chrM", "chrX", "chrY", "Rb", "Mrp", "act", "hb", "MALAT1"),
  outlier_cut = 0.01,
  outlier_fraction = 0.1,
  update_gibbs = TRUE,
  gibbs_control = list(chain.length = 1000, burn.in = 500, thinning = 2),
  opt_control = list(trace = 0, maxit = 1e+05),
  n_cores = 1,
  which_theta = "final",
  state_or_type = "type"
)

Arguments

bulk_gene_expression: A matrix of bulk data. Rows are genes, columns are samples. Row and column names need to be set.
single_cell_object: A matrix with the single-cell data. Rows are genes, columns are samples. Row and column names need to be set.
cell_type_annotations: A vector of the cell type annotations. Has to be in the same order as the samples in single_cell_object.
cell_subtype_labels: a character or factor vector indicating the cell subtype of each row of the raw count matrix of scRNA-seq or gene expression profile (GEP). The length needs be equal to nrow(ref.dat). Default is NULL, which uses the same value of cell.type.labels. Note that TED computes the posterior sum over the subtypes to get the total fraction / expression of each cell type. This allows a more fine-grained definition of cell types / cell states.
tum_key: The character in cell.type.labels denoting the tumor cells, e.g. "tumor" or "malignant".
apply_bayes_prism_filtering: set to TRUE if you want to run the cleanup.genes function of BayesPrism; default is FALSE
species: A character variable to denote if genes are human ("mm") or mouse ("hs").
exp.cells: Genes expressed in number of cells fewer than this will be excluded. Default=1. If the input is GEP, gene will be selected by automatically setting exp.cells is set to min(exp.cells,1). As a result genes expressed in at least 0 or 1 cell type will be retained. Only used when apply_bayes_prism_filtering is TRUE.
pseudo_min: A numeric value indicating the minimum (non-zero) value of phi. Default=1E-8.
gene_group: a character vector to input gene groups to be removed, must be one or more elements from c("other_Rb","chrM","chrX","chrY","Rb","Mrp","act","hb","MALAT1"). Only used when apply_bayes_prism_filtering is TRUE.
outlier_cut, outlier_fraction: Filter genes in X whose expression fraction is greater than outlier.cut (Default=0.01) in more than outlier.fraction (Default=0.1) of bulk data. Typically for dataset with reasonable quality control, very few genes will be filtered. Removal of outlier genes will ensure that the inference will not be dominated by outliers, which sometimes may be resulted from poor QC in mapping.
update_gibbs: A logical variable to denote whether run final Gibbs sampling to update theta. Default=TRUE.
gibbs_control: A list of parameters controlling the Gibbs sampling. Default chain.length=1000, burn.in=500, thinning=2. A list of parameters controlling the Gibbs sampling. Default chain.length=1000, burn.in=500, thinning=2. Previous version default is chain.length=400, burn.in=200, thinning=2. Default chain length has been increased to accommodate spatial transcriptomic data which usually has lower depth than conventional bulk data, and hence may need longer chain to reach the stationary distribution.
opt_control: A list of parameters controlling the optimization by Rcgmin, Default trace=0, maxit= 100000.
n_cores: Number of CPU threads used for parallel computing. Default=1
which_theta: A character variable to denote whether to extract results from first or final Gibbs sampling.
state_or_type: A character variable to extract results from cell type or cell state. We caution the interpretation of cell states information when their transcription are highly co-linear.

Value

A list of results is returned including:

bp.resThe result of run.prism, a "BayesPrism" S4 object.
thetaThe result of get.fraction, the extracted cell fraction results from the BayesPrism object.
bp.res$prismAn S4 object of the class "prism" to represent the input prism object;
bp.res$posterior.initial.cellStateAn S4 object of the class "jointPost" to represent the posterior mean of cell state fraction and cell state expression outputted by the initial Gibbs sampling using cell state pirors. Contains Z (inferred expression), theta (inferred fraction) and theta.cv (coefficient of variation of posterior of theta)
bp.res$posterior.initial.cellTypeAn S4 object of the class "jointPost" to represent the posterior sum of cell states from each cell type (posterior.initial.cellState);
bp.res$reference.updateAn S4 obejct of the class "reference" to represent the updated profile ψ;
bp.res$posterior.theta_f:An S4 object of the class "thetaPost" to represent the updated cell type fraction. Contains theta (inferred fraction) and theta.cv (coefficient of variation of posterior of theta);
bpres$control_paramA list storing the gibbs.control, opt.control and update.gibbs arguments.

Details

Run Bayesian deconvolution to estimate cell type composition and gene expression.