Run multiple deconvolution methods and create aggregated results
Source:R/main.R
deconvolute_combined.Rd
This function runs multiple cell-type deconvolution methods on the same methylation data and creates both individual method results and an aggregated (averaged) estimate. The aggregation approach can help reduce method-specific biases and provide more robust cell-type proportion estimates.
Usage
deconvolute_combined(
methyl_set,
array = c("450k", "EPIC"),
methods,
scale_results = FALSE,
...
)
Arguments
- methyl_set
A minfi MethylSet
- array
type of methylation array that was used. possible options are '450k' and 'EPIC'
- methods
list of methods (>1) that will be applied to the methyl set
- scale_results
Whether the deconvolution results should be rescaled. Negative values will be set to 0, and the estimates will be normalized to sum to 1 per sample. Defaults to FALSE.
- ...
Additional parameters, passed to the algorithm used. See individual method documentations for details.
Value
A data frame with columns: sample, method, celltype, value. Contains results from all individual methods plus an 'aggregated' method that averages the estimates across methods.
Details
How it works:
Runs each specified method independently on the methylation data
Standardizes cell-type names across methods using
rename_cell_types()
For MethylResolver specifically, combines Tnaive and Tmem into "T cell CD4+" to match other methods
Calculates the mean estimate for each cell type across all methods (aggregated results)
Returns both individual method results and aggregated results in a long-format data frame
Cell-type standardization:
The function uses rename_cell_types()
to standardize cell-type names across different methods. This mapping includes:
CD8T/CD8/CD8T-cells_EPIC → "T cell CD8+"
CD4T/CD4T-cells_EPIC → "T cell CD4+"
B/Bcell/B-cells_EPIC → "B cell"
NK/NK-cells_EPIC → "NK cell"
Mono/Mon/Monocytes_EPIC → "Monocyte"
Neu/Neutro/Neutrophils/Neutrophils_EPIC → "Neutrophil"
Any unrecognized cell types → "other"
Meaning of 'other' cell types: The "other" category includes:
Cell types that are not in the standardized mapping above
Method-specific cell types that don't have direct equivalents in other methods
Rare or specialized cell populations that are only detected by certain methods
Limitations of the aggregation approach:
Method heterogeneity: Different methods use different algorithms, reference datasets, and cell-type definitions, which may not be directly comparable
Missing data handling: If a method fails to estimate a particular cell type, it may be excluded from the aggregation, potentially biasing results
Equal weighting: The current implementation gives equal weight to all methods, regardless of their individual performance or reliability
Cell-type coverage: Methods may detect different sets of cell types, leading to incomplete coverage in aggregated results
No confidence intervals: The aggregation provides point estimates without uncertainty quantification
Examples
ex_data <- minfiData::MsetEx
result <- deconvolute_combined(ex_data, methods=c('epidish','houseman'))
#> Warning: 12 NA values detected in your beta matrix. Replacing them with 0.5.
#> RPC was chosen as default for "mode"
#> blood was chosen as default for "reference"
#> Starting EpiDISH deconvolution with mode RPC ...
#> 450k was chosen as default for "array"
#> Blood was chosen as default for "compositeCellType"
#> IlluminaHumanMethylationEPIC was chosen as default for "referencePlatform"
#> IDOL was chosen as default for "probeSelect"
#> [estimateCellCounts2] The function will assume that no preprocessing has been performed. Using 'preprocessQuantile' in prenormalized data is experimental and it should only be run under the user responsibility
#> Loading required package: FlowSorted.Blood.EPIC
#> Loading required package: ExperimentHub
#> Loading required package: AnnotationHub
#> Loading required package: BiocFileCache
#> Loading required package: dbplyr
#>
#> Attaching package: ‘AnnotationHub’
#> The following object is masked from ‘package:Biobase’:
#>
#> cache
#> see ?FlowSorted.Blood.EPIC and browseVignettes('FlowSorted.Blood.EPIC') for documentation
#> downloading 1 resources
#> retrieving 1 resource
#> loading from cache
#> Loading required package: IlluminaHumanMethylationEPICmanifest
#> [convertArray] Casting as IlluminaHumanMethylationEPIC
#> [estimateCellCounts2] Combining user data with reference (flow sorted) data.
#> Warning: NAs introduced by coercion
#> [estimateCellCounts2] Processing user and reference data together.
#> [preprocessQuantile] Mapping to genome.
#> Loading required package: IlluminaHumanMethylationEPICanno.ilm10b4.hg19
#> [preprocessQuantile] Fixing outliers.
#> [preprocessQuantile] Quantile normalizing.
#> [estimateCellCounts2] Using IDOL L-DMR probes for composition estimation.
#> [estimateCellCounts2] Estimating proportion composition (prop), if you provide cellcounts those will be provided as counts in the composition estimation.