Skip to contents

Background: What Are Signature Matrices?

Cell-type deconvolution methods rely on signature matrices (also called reference matrices or basis matrices) to estimate the proportions of different cell types in a heterogeneous DNA methylation sample. A signature matrix is a table where each row corresponds to a CpG site (or region), and each column corresponds to a cell type. The entries represent the expected methylation value (often as beta values) for each CpG in each pure cell type.

Signature matrices are typically constructed from reference datasets where DNA methylation has been measured in sorted or purified cell populations. The choice of CpGs and the quality of the reference data are critical for accurate deconvolution. Different methods use different strategies for selecting CpGs and constructing their signature matrices:

  • EpiDISH uses reference matrices for blood, breast, or epithelial cell types, based on published datasets.
  • Houseman uses optimized sets of CpGs (e.g., IDOL-optimized) for blood and other tissues.
  • MethylCC uses differentially methylated regions (DMRs) identified from reference data.
  • MethylResolver uses a signature matrix derived from large-scale reference data, optimized for robust deconvolution.
  • MethAtlas uses a comprehensive reference atlas, with options for tissue-wide or immune-specific signatures.

The accuracy of cell-type deconvolution depends heavily on how well the signature matrix represents the true methylation profiles of the cell types present in your samples. In some cases, you may want to use a custom set of CpGs (for example, to focus on a subset relevant to your study or to match the coverage of your data).

Accessing Signature Matrices

methyldeconv provides functions to access the signature matrices used by each deconvolution method. These matrices define the reference methylation profiles for each cell type.

EpiDISH

# Get the blood signature matrix used by EpiDISH
sig_epidish <- get_epidish_signature_matrix(reference = "blood")
head(sig_epidish)
##         CpGs     B    NK  CD4T  CD8T  Mono Neutro Eosino
## 1 cg01024458 0.034 0.962 0.969 0.958 0.987  0.992  0.991
## 2 cg11661493 0.037 0.954 0.971 0.964 0.958  0.962  0.962
## 3 cg21596498 0.060 0.980 0.989 0.989 0.980  0.976  0.984
## 4 cg05205074 0.049 0.954 0.967 0.961 0.967  0.979  0.986
## 5 cg17232476 0.045 0.962 0.969 0.958 0.932  0.980  0.976
## 6 cg14936008 0.065 0.976 0.980 0.983 0.977  0.983  0.986

Houseman

# Get the Houseman signature matrix
sig_houseman <- get_houseman_signature_matrix()
head(sig_houseman)
##         CpGs      CD8T      CD4T         NK     Bcell       Mono        Neu
## 1 cg08769189 0.1970004 0.6744559 0.52214953 0.9006926 0.82187345 0.69553933
## 2 cg07661835 0.1047903 0.4526285 0.88845613 0.8908425 0.90251916 0.88942404
## 3 cg00219921 0.1349703 0.8157856 0.92581678 0.9540993 0.94637918 0.94059532
## 4 cg13468685 0.6535697 0.9146197 0.07561249 0.1733022 0.03540601 0.03407628
## 5 cg04329870 0.2462235 0.8529208 0.86967475 0.8999110 0.90301117 0.90902010
## 6 cg14085952 0.7537942 0.2262851 0.90332344 0.7452843 0.80620727 0.90870910

MethylCC

# Get the DMRs (signature matrix) used by MethylCC
sig_methylcc <- get_methylcc_signature_matrix()
head(sig_methylcc)
##   seqnames     start       end width strand indexStart indexEnd L      p.value
## 1     chr8 145008110 145008397   288      *     238113   238115 3 5.846134e-13
## 2    chr13  99223336  99223562   227      *     334524   334525 2 4.997645e-13
## 3    chr21  45565328  45565564   237      *     464088   464089 2 9.415751e-12
## 4     chr1 226036279 226036279     1      *      42539    42539 1 7.167314e-19
## 5     chr7 106511741 106511741     1      *     208376   208376 1 6.686998e-15
## 6    chr19  10519375  10519375     1      *     432047   432047 1 2.052548e-14
##           dm dmr_max_diff dmr_status status cellType
## 1 -0.4454907    0.4858436        DMR     Up     Gran
## 2 -0.5149938    0.4649006        DMR     Up     Gran
## 3 -0.4231612    0.4044535        DMR     Up     Gran
## 4 -0.6644724    0.2957124        DMR     Up     Gran
## 5 -0.6039376    0.3929102        DMR     Up     Gran
## 6 -0.5303630    0.4404731        DMR     Up     Gran

MethylResolver

# Get the signature matrix used by MethylResolver
sig_methylresolver <- get_methylresolver_signature_matrix()
head(sig_methylresolver)
##         CpGs       Mon Dendritic     Macro       Neu       Eos      Treg
## 1 cg03179542 0.7079238 0.5662990 0.5724228 0.3422393 0.3529072 0.1905375
## 2 cg02138358 0.7304353 0.6057991 0.5702307 0.6094019 0.3597952 0.1725188
## 3 cg12449049 0.7187701 0.5779714 0.5987460 0.3633360 0.2347597 0.1245479
## 4 cg01902758 0.7521254 0.6425354 0.6293023 0.4794861 0.3840054 0.1669161
## 5 cg16303353 0.8379053 0.6608612 0.6633956 0.7286518 0.5551461 0.1602591
## 6 cg14786790 0.7026424 0.5863443 0.5973881 0.3865554 0.1404375 0.1064646
##       Tnaive      Tmem        CD8         NK      Bcell
## 1 0.10841718 0.2411970 0.12299279 0.20552450 0.12358545
## 2 0.13332736 0.1832339 0.14501405 0.21984128 0.15810912
## 3 0.08579205 0.1471621 0.05390728 0.15745052 0.09672281
## 4 0.12559347 0.1708202 0.13631469 0.27673297 0.29411627
## 5 0.14479402 0.1868098 0.14689298 0.27842752 0.26920249
## 6 0.10706541 0.1269536 0.00000000 0.05241609 0.13360442

MethAtlas

# Get the default reference matrix used by MethAtlas
sig_methatlas <- get_methatlas_signature_matrix()
head(sig_methatlas)
##         CpGs Monocytes_EPIC B-cells_EPIC CD4T-cells_EPIC NK-cells_EPIC
## 1 cg08169020         0.8866       0.2615          0.0149        0.0777
## 2 cg25913761         0.8363       0.2210          0.2816        0.4705
## 3 cg26955540         0.7658       0.0222          0.1492        0.4005
## 4 cg25170017         0.8861       0.5116          0.1021        0.4363
## 5 cg12827637         0.5212       0.3614          0.0227        0.2120
## 6 cg19442545         0.2013       0.1137          0.0608        0.0410
##   CD8T-cells_EPIC Neutrophils_EPIC Erythrocyte_progenitors Adipocytes
## 1          0.0164           0.8680                  0.9509     0.0336
## 2          0.3961           0.8293                  0.2385     0.3578
## 3          0.3474           0.7915                  0.1374     0.1965
## 4          0.0875           0.7042                  0.9447     0.0842
## 5          0.0225           0.5368                  0.4667     0.0287
## 6          0.0668           0.1952                  0.1601     0.0364
##   Cortical_neurons Hepatocytes Lung_cells Pancreatic_beta_cells
## 1           0.0168      0.0340     0.0416              0.038875
## 2           0.3104      0.2389     0.2250              0.132000
## 3           0.0978      0.0338     0.0768              0.041725
## 4           0.2832      0.2259     0.0544              0.111750
## 5           0.1368      0.0307     0.1607              0.065975
## 6           0.0222      0.1574     0.0122              0.003825
##   Pancreatic_acinar_cells Pancreatic_duct_cells Vascular_endothelial_cells
## 1                  0.0209                0.0130                     0.0323
## 2                  0.2249                0.1996                     0.3654
## 3                  0.0314                0.0139                     0.2382
## 4                  0.0309                0.0217                     0.0972
## 5                  0.0370                0.0230                     0.0798
## 6                  0.0378                0.0347                     0.0470
##   Colon_epithelial_cells Left_atrium Bladder Breast Head_and_neck_larynx Kidney
## 1                 0.0163      0.0386  0.0462 0.0264               0.0470 0.0269
## 2                 0.2037      0.2446  0.2054 0.1922               0.2045 0.1596
## 3                 0.0193      0.1134  0.1269 0.1651               0.1523 0.1034
## 4                 0.0187      0.0674  0.0769 0.0691               0.0704 0.0604
## 5                 0.0193      0.0432  0.0459 0.0228               0.0687 0.0234
## 6                 0.0193      0.0287  0.0246 0.0081               0.0098 0.0309
##   Prostate Thyroid Upper_GI Uterus_cervix
## 1   0.0353  0.0553   0.0701        0.0344
## 2   0.1557  0.1848   0.1680        0.2026
## 3   0.0686  0.0943   0.1298        0.1075
## 4   0.0369  0.0412   0.0924        0.0697
## 5   0.0508  0.0726   0.0759        0.0196
## 6   0.0055  0.0188   0.0090        0.0166

Prepare Example Data

library(minfiData)
methyl_set <- minfiData::MsetEx
beta_matrix <- minfi::getBeta(minfi::ratioConvert(methyl_set))

Example: Using Custom CpGs from EpiDISH and Houseman signatures

When using custom CpGs for deconvolution, it is important to ensure that the CpGs you select are present in the signature matrix for the method you are using. Here, we demonstrate this feature for EpiDISH and Houseman, as their signature matrices may have some overlap. For other methods, custom CpG support is available, but the overlap with EpiDISH/Houseman signatures may be zero, so we do not show those examples here.

# Get CpGs from each method's signature matrix
cpgs_epidish <- sig_epidish$CpGs
cpgs_houseman <- sig_houseman$CpGs

# Example: intersection of CpGs between EpiDISH and Houseman
custom_cpgs <- intersect(cpgs_epidish, cpgs_houseman)
length(custom_cpgs)
## [1] 13

Example: Running EpiDISH with Custom CpGs

# Use the method-specific subset
result_custom_epidish <- run_epidish(beta_matrix, cpg_subset = custom_cpgs)

Comparing Results: Custom CpGs vs Full Signature

You can compare the deconvolution results obtained using the custom CpGs to those obtained using the full EpiDISH signature matrix:

# Run EpiDISH with the full signature
result_full_epidish <- run_epidish(beta_matrix)


# Optionally, visualize the differences (e.g., for the first sample) using tidyverse
if (requireNamespace("tidyr", quietly = TRUE) && requireNamespace("dplyr", quietly = TRUE) && requireNamespace("ggplot2", quietly = TRUE)) {
  library(dplyr)
  library(tidyr)
  library(ggplot2)
  df_compare <- tibble(
    CellType = colnames(result_full_epidish$estF),
    Full = as.numeric(result_full_epidish$estF[1,]),
    Custom = as.numeric(result_custom_epidish$estF[1, colnames(result_custom_epidish$estF)])
  ) %>%
    pivot_longer(cols = c(Full, Custom), names_to = "Signature", values_to = "Fraction")
  
  ggplot(df_compare, aes(x = CellType, y = Fraction, fill = Signature)) +
    geom_bar(stat = "identity", position = "dodge") +
    labs(title = "EpiDISH: Full vs Custom CpGs (Sample 1)", y = "Estimated Fraction") +
    scale_fill_manual(values = c("Full" = "bisque", "Custom" = "darkgreen"))+
    theme_minimal()
}


For more details on the available arguments and customization options, see the function documentation or the source code in the R/ directory.