R/single_cell_simulation.R
make_random_bulk.Rd
Take random single cells from the input expression set, given the cell_fractions vector and combine them by taking the mean.
make_random_bulk(eset, cell_fractions, n_cells = 500, combine = mean)
Biobase::ExpressionSet
with a cell_type
column in pData
.
Each sample is the gene expression of a single cell. The cell
type of each cell needs to be denoted in the pData table in the
cell_type
column.
named list indicating the fraction of each cell type
which will be in the sample. The names of the list need to correspond
to the cell_type
column in the ExpressionSet.
number of single cells to integrate into a sample
callback function used to aggregate the counts.
a single-column tibble with simulated expression for each gene. The column sum is scaled to 1 million (TPM)
suppressPackageStartupMessages(library(Biobase))
suppressPackageStartupMessages(library(tibble))
# generate toy matrix with three marker genes and three cell types
expr <- matrix(c(
rep(c(1, 0, 0), 300),
rep(c(0, 1, 0), 300),
rep(c(0, 0, 1), 300)
), nrow = 3)
# generate a featureData and phenoData data-frame.
# row names must be consistent between expr and featureData.
gene_names <- c("CD8A", "CD4", "CD19")
rownames(expr) <- gene_names
cell_types <- c(rep("T cell CD8+", 300), rep("T cell CD4+", 300), rep("B cell", 300))
pdata <- data.frame(cell_type = cell_types)
fdata <- data.frame(gene_symbol = gene_names)
rownames(fdata) <- gene_names
# tie expr, fdata and pdata together in expression set
eset <- ExpressionSet(expr,
phenoData = as(pdata, "AnnotatedDataFrame"),
featureData = as(fdata, "AnnotatedDataFrame")
)
# make a random bulk sample.
make_random_bulk(eset,
c("T cell CD8+" = 0.3, "B cell" = 0.4, "T cell CD4+" = 0.3),
n_cells = 1000
)
#> # A tibble: 3 × 1
#> value
#> <dbl>
#> 1 300000
#> 2 300000
#> 3 400000