Take random single cells from the input expression set, given the cell_fractions vector and combine them by taking the mean.
make_random_bulk(eset, cell_fractions, n_cells = 500, combine = mean)
with a cell_type
column in pData
Each sample is the gene expression of a single cell. The cell
type of each cell needs to be denoted in the pData table in the
named list indicating the fraction of each cell type
which will be in the sample. The names of the list need to correspond
to the cell_type
column in the ExpressionSet.
number of single cells to integrate into a sample
callback function used to aggregate the counts.
a single-column tibble with simulated expression for each gene. The column sum is scaled to 1 million (TPM)
# generate toy matrix with three marker genes and three cell types
expr <- matrix(c(
rep(c(1, 0, 0), 300),
rep(c(0, 1, 0), 300),
rep(c(0, 0, 1), 300)
), nrow = 3)
# generate a featureData and phenoData data-frame.
# row names must be consistent between expr and featureData.
gene_names <- c("CD8A", "CD4", "CD19")
rownames(expr) <- gene_names
cell_types <- c(rep("T cell CD8+", 300), rep("T cell CD4+", 300), rep("B cell", 300))
pdata <- data.frame(cell_type = cell_types)
fdata <- data.frame(gene_symbol = gene_names)
rownames(fdata) <- gene_names
# tie expr, fdata and pdata together in expression set
eset <- ExpressionSet(expr,
phenoData = as(pdata, "AnnotatedDataFrame"),
featureData = as(fdata, "AnnotatedDataFrame")
# make a random bulk sample.
c("T cell CD8+" = 0.3, "B cell" = 0.4, "T cell CD4+" = 0.3),
n_cells = 1000
#> # A tibble: 3 × 1
#> value
#> <dbl>
#> 1 300000
#> 2 300000
#> 3 400000