Make a random bulk sample from a single-cell dataset — make_random

Take random single cells from the input expression set, given the cell_fractions vector and combine them by taking the mean.

make_random_bulk(eset, cell_fractions, n_cells = 500, combine = mean)

Arguments

eset: Biobase::ExpressionSet with a cell_type column in pData. Each sample is the gene expression of a single cell. The cell type of each cell needs to be denoted in the pData table in the cell_type column.
cell_fractions: named list indicating the fraction of each cell type which will be in the sample. The names of the list need to correspond to the cell_type column in the ExpressionSet.
n_cells: number of single cells to integrate into a sample
combine: callback function used to aggregate the counts.

Value

a single-column tibble with simulated expression for each gene. The column sum is scaled to 1 million (TPM)

Examples

suppressPackageStartupMessages(library(Biobase))
suppressPackageStartupMessages(library(tibble))

# generate toy matrix with three marker genes and three cell types
expr <- matrix(c(
  rep(c(1, 0, 0), 300),
  rep(c(0, 1, 0), 300),
  rep(c(0, 0, 1), 300)
), nrow = 3)

# generate a featureData and phenoData data-frame.
# row names must be consistent between expr and featureData.
gene_names <- c("CD8A", "CD4", "CD19")
rownames(expr) <- gene_names
cell_types <- c(rep("T cell CD8+", 300), rep("T cell CD4+", 300), rep("B cell", 300))
pdata <- data.frame(cell_type = cell_types)
fdata <- data.frame(gene_symbol = gene_names)
rownames(fdata) <- gene_names

# tie expr, fdata and pdata together in expression set
eset <- ExpressionSet(expr,
  phenoData = as(pdata, "AnnotatedDataFrame"),
  featureData = as(fdata, "AnnotatedDataFrame")
)

# make a random bulk sample.
make_random_bulk(eset,
  c("T cell CD8+" = 0.3, "B cell" = 0.4, "T cell CD4+" = 0.3),
  n_cells = 1000
)
#> # A tibble: 3 × 1
#>    value
#>    <dbl>
#> 1 300000
#> 2 300000
#> 3 400000