Workflow Utilities
spacedeconv_utils.Rmd
spacedeconv offers a variety of workflow helper
functions that streamline the overall analysis process. In the
following we will give an overview over the available functions.
preprocessnormalizeprint_infoavailable_resultsaggregate_resultsaddCustomAnnotationannotate_spotsscale_cell_countssubsetSCEsubsetSPE
## → Setting up spacedeconv environment..
## → Using conda environment 'r-omnideconv'
1. preprocess
The function can be used to preprocess single-cell or spatial
datasets. The cuts of low and high UMI observations, removes noisy
expression and performs additional checks to streamline the
deconvolution analysis. The functions takes a
SingleCellExperiment, AnnData or Seurat and returns a processed
SingleCellExperiment. min_umi or
max_umi parameters can be set to improve data
quality. The assay can be selected with the
assay parameter. Additionally Mitochondria Genes can
be removed by setting remove_mito=TRUE.
data("single_cell_data_3")
data("spatial_data_3")
sce <- spacedeconv::preprocess(single_cell_data_3, min_umi = 500, assay = "counts", remove_mito = TRUE)
## ── spacedeconv ─────────────────────────────────────────────────────────────────
## ℹ testing parameter
## ✔ parameter OK [84ms]
##
## ℹ Removing 8 observations with umi count below threshold
## ✔ Removed 8 observations with umi count below threshold [1.6s]
##
## ℹ Removing 5862 variables with all zero expression
## ✔ Removed 5862 variables with all zero expression [921ms]
##
## ℹ Removing 13 mitochondria genes
## ✔ Removed 13 mitochondria genes [843ms]
##
## ℹ Checking for ENSEMBL Identifiers
## ! Warning: ENSEMBL identifiers detected in gene names
## ℹ Checking for ENSEMBL Identifiersℹ Consider using Gene Names for first-generation deconvolution tools
## ℹ Checking for ENSEMBL Identifiers✔ Finished Preprocessing [10ms]
spe <- spacedeconv::preprocess(spatial_data_3, min_umi = 500, assay = "counts", remove_mito = TRUE)
## ── spacedeconv ─────────────────────────────────────────────────────────────────
## ℹ testing parameter
## ✔ parameter OK [35ms]
##
## ℹ Removing 137 observations with umi count below threshold
## ✔ Removed 137 observations with umi count below threshold [128ms]
##
## ℹ Removing 13049 variables with all zero expression
## ✔ Removed 13049 variables with all zero expression [147ms]
##
## ℹ Removing 13 mitochondria genes
## ✔ Removed 13 mitochondria genes [120ms]
##
## ℹ Checking for ENSEMBL Identifiers
## ! Warning: ENSEMBL identifiers detected in gene names
## ℹ Checking for ENSEMBL Identifiersℹ Consider using Gene Names for first-generation deconvolution tools
## ℹ Checking for ENSEMBL Identifiers✔ Finished Preprocessing [11ms]
2. normalize
You can scale and normalize your single-cell or spatial data by
calling the normalize function. The function takes a
method parameter where cpmor
logcpmcan be selected. The normalized data is stored
as an additional assay in the object.
sce <- spacedeconv::normalize(sce, method = "cpm", assay = "counts")
## ── spacedeconv ─────────────────────────────────────────────────────────────────
## ℹ testing parameter
## ✔ parameter OK [20ms]
##
## ℹ Normalizing using cpm
## Warning in asMethod(object): sparse->dense coercion: allocating vector of size
## 1.4 GiB
## ✔ Finished normalization using cpm [3.9s]
##
## ℹ Please note the normalization is stored in an additional assay
spe <- spacedeconv::normalize(spe, method = "cpm", assay = "counts")
## ── spacedeconv ─────────────────────────────────────────────────────────────────
## ℹ testing parameter
## ✔ parameter OK [17ms]
##
## ℹ Normalizing using cpm
## ✔ Finished normalization using cpm [395ms]
##
## ℹ Please note the normalization is stored in an additional assay
3. print_info
You can obtain additional info about your dataset by calling
print_info.
print_info(sce)
##
## ── Single Cell
## Assays: "counts" and "cpm"
## Genes: 23858
## → without expression: 0 (0%)
## Cells: 7978
## → without expression: 0 (0%)
## Umi count range: 447 - 74244
## ✔ Rownames set
## ✔ Colnames set
4. available_results
You can check what deconvolution results and additional annotation
is available in your data by calling
available_resutls. You can set the
method parameter to the name of a deconvolution tool
to further filter the results if many quantifications where
performed.
# "deconv" contains DWLS results
available_results(deconv)
## [1] "spatialdwls_B.cells" "spatialdwls_CAFs"
## [3] "spatialdwls_Cancer.Epithelial" "spatialdwls_Endothelial"
## [5] "spatialdwls_Myeloid" "spatialdwls_Normal.Epithelial"
## [7] "spatialdwls_Plasmablasts" "spatialdwls_PVL"
## [9] "spatialdwls_T.cells"
5. aggregate_results
You can aggregate fine-grained deconvolution results to a single
value by providing a list of deconvolution result names to the
cell_types parameter. You can additionally set a new
name and you have the option to
remove the original fine-grained columns and just
keep the aggregation.
aggregate_results(deconv, cell_types = c("spatialdwls_Cancer.Epithelial", "spatialdwls_Normal.Epithelial"), name = "spatialdwls_Epithelial", remove = TRUE)
## ── spacedeconv ─────────────────────────────────────────────────────────────────
## ℹ testing parameter
## ✔ parameter OK [9ms]
##
## ℹ Aggregating cell types
## ✔ Aggregated cell types [12ms]
##
## class: SpatialExperiment
## dim: 23529 1185
## metadata(0):
## assays(2): counts cpm
## rownames(23529): AL627309.1 AL627309.5 ... AC007325.4 AC007325.2
## rowData names(2): symbol ensembl
## colnames(1185): AAACAATCTACTAGCA-1 AAACACCAATAACTGC-1 ...
## TTGTTTCATTAGTCTA-1 TTGTTTGTGTAAATTC-1
## colData names(12): in_tissue array_row ... spatialdwls_Myeloid
## spatialdwls_Epithelial
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
## imgData names(4): sample_id image_id data scaleFactor
6. addCustomAnnotation
This function helps adding a custom annotation vector to a SpatialExperiment object.
# newAnnotation is a vector containing custom annotation for each spot
spe <- addCustomAnnotation(spe, columnName = "ManualAnnotation", values = new_annotation)
7. annotate_spots
This function is able to annotate spots with TRUE / FALSE if you want to classify a specific subgroup of spots. It takes a list of spots that should be classified as TRUE, setting all other spots to FALSE.
# spots is a list of spot names.
spe <- annotate_spots(spe, spots, value_pos = TRUE, value_neg = FALSE, name = "customAnnotation")
8. scale_cell_counts
Most deconvolution tools compute relative cell fractions for
spots. If you have cell counts for each spot you can scale the
relative values to absolute cell counts using this function. The
input parameters are the column name that should be scaled
value and a vector of absolute cell counts for each
spot cell_counts. You can also set a new
resName.
# cell_counts_per_spot contains spot level absolute cell counts
spe_absolute <- scale_cell_counts(spe, value = "spatialdwls_B.cells", cell_counts = cell_counts_per_spot, resName = "BCellsAbsolute")
9. subsetSCE
To improve resource requirements for deconvolution computation you
can reduce your input scRNA-seq reference size by subsetting. The
functions requires your input sce object, the column
name containing the cell-type annotation
cell_type_col. You can specify the subsetting
scenario scenario as one of “mirror” or “even”. The
mirror scenario keeps the same cell-type proportions as in the
input data but reduces the overall cell number. The even scenario
selects the same number of cells for each cell-type. Specify the
number of cells you want after subsetting using the
ncells parameter. In case notEnoughcells
are available for a cell-type to match the required number
according to the scenario you can set this parameter to “asis” to
keep all remaining cells or “remove” the cell-type completely.
subset <- subsetSCE(sce, cell_type_col = "celltype_major", scenario = "mirror", ncells = 500)
## ── spacedeconv ─────────────────────────────────────────────────────────────────
## ℹ testing parameter
## ℹ Set seed to 12345
## ℹ testing parameter✔ parameter OK [16ms]
##
## ℹ extracting up to 500 cells
## ✔ extracting up to 500 cells [183ms]
##
## ℹ extracted 501 cells
## ✔ extracted 501 cells [20ms]