Run multiple deconvolution methods and create aggregated results

This function runs multiple cell-type deconvolution methods on the same methylation data and creates both individual method results and an aggregated (averaged) estimate. The aggregation approach can help reduce method-specific biases and provide more robust cell-type proportion estimates.

Usage

deconvolute_combined(
  methyl_set,
  array = c("450k", "EPIC"),
  methods,
  scale_results = FALSE,
  ...
)

Arguments

methyl_set: A minfi MethylSet
array: type of methylation array that was used. possible options are '450k' and 'EPIC'
methods: list of methods (>1) that will be applied to the methyl set
scale_results: Whether the deconvolution results should be rescaled. Negative values will be set to 0, and the estimates will be normalized to sum to 1 per sample. Defaults to FALSE.
...: Additional parameters, passed to the algorithm used. See individual method documentations for details.

Value

A data frame with columns: sample, method, celltype, value. Contains results from all individual methods plus an 'aggregated' method that averages the estimates across methods.

Details

How it works:

Runs each specified method independently on the methylation data
Standardizes cell-type names across methods using rename_cell_types()
For MethylResolver specifically, combines Tnaive and Tmem into "T cell CD4+" to match other methods
Calculates the mean estimate for each cell type across all methods (aggregated results)
Returns both individual method results and aggregated results in a long-format data frame

Cell-type standardization: The function uses rename_cell_types() to standardize cell-type names across different methods. This mapping includes:

CD8T/CD8/CD8T-cells_EPIC → "T cell CD8+"
CD4T/CD4T-cells_EPIC → "T cell CD4+"
B/Bcell/B-cells_EPIC → "B cell"
NK/NK-cells_EPIC → "NK cell"
Mono/Mon/Monocytes_EPIC → "Monocyte"
Neu/Neutro/Neutrophils/Neutrophils_EPIC → "Neutrophil"
Any unrecognized cell types → "other"

Meaning of 'other' cell types: The "other" category includes:

Cell types that are not in the standardized mapping above
Method-specific cell types that don't have direct equivalents in other methods
Rare or specialized cell populations that are only detected by certain methods

Limitations of the aggregation approach:

Method heterogeneity: Different methods use different algorithms, reference datasets, and cell-type definitions, which may not be directly comparable
Missing data handling: If a method fails to estimate a particular cell type, it may be excluded from the aggregation, potentially biasing results
Equal weighting: The current implementation gives equal weight to all methods, regardless of their individual performance or reliability
Cell-type coverage: Methods may detect different sets of cell types, leading to incomplete coverage in aggregated results
No confidence intervals: The aggregation provides point estimates without uncertainty quantification

Examples


ex_data <- minfiData::MsetEx

result <- deconvolute_combined(ex_data, methods=c('epidish','houseman'))
#> Warning: 12 NA values detected in your beta matrix. Replacing them with 0.5.
#> RPC was chosen as default for "mode"
#> blood was chosen as default for "reference"
#> Starting EpiDISH deconvolution with mode RPC ...
#> 450k was chosen as default for "array"
#> Blood was chosen as default for "compositeCellType"
#> IlluminaHumanMethylationEPIC was chosen as default for "referencePlatform"
#> IDOL was chosen as default for "probeSelect"
#> [estimateCellCounts2] The function will assume that no preprocessing has been performed. Using 'preprocessQuantile' in prenormalized data is experimental and it should only be run under the user responsibility
#> Loading required package: FlowSorted.Blood.EPIC
#> Loading required package: ExperimentHub
#> Loading required package: AnnotationHub
#> Loading required package: BiocFileCache
#> Loading required package: dbplyr
#> 
#> Attaching package: ‘AnnotationHub’
#> The following object is masked from ‘package:Biobase’:
#> 
#>     cache
#> see ?FlowSorted.Blood.EPIC and browseVignettes('FlowSorted.Blood.EPIC') for documentation
#> downloading 1 resources
#> retrieving 1 resource
#> loading from cache
#> Loading required package: IlluminaHumanMethylationEPICmanifest
#> [convertArray] Casting as IlluminaHumanMethylationEPIC
#> [estimateCellCounts2] Combining user data with reference (flow sorted) data.
#> Warning: NAs introduced by coercion
#> [estimateCellCounts2] Processing user and reference data together.
#> [preprocessQuantile] Mapping to genome.
#> Loading required package: IlluminaHumanMethylationEPICanno.ilm10b4.hg19
#> [preprocessQuantile] Fixing outliers.
#> [preprocessQuantile] Quantile normalizing.
#> [estimateCellCounts2] Using IDOL L-DMR probes for composition estimation.
#> [estimateCellCounts2] Estimating proportion composition (prop), if you provide cellcounts those will be provided as counts in the composition estimation.