Build SummarizedExperiment using a h5ad file for the counts
Usage
dataset_h5ad(
h5ad_file_counts,
h5ad_file_tpm = NULL,
cell_id_col = "ID",
cell_type_col = "cell_type",
cells_in_obs = TRUE,
name = "SimBu_dataset",
spike_in_col = NULL,
additional_cols = NULL,
filter_genes = TRUE,
variance_cutoff = 0,
type_abundance_cutoff = 0,
scale_tpm = TRUE
)
Arguments
- h5ad_file_counts
(mandatory) h5ad file with raw count data
- h5ad_file_tpm
h5ad file with TPM count data
- cell_id_col
(mandatory) name of column in Seurat meta.data with unique cell ids; 0 for rownames
- cell_type_col
(mandatory) name of column in Seurat meta.data with cell type name
- cells_in_obs
boolean, if TRUE, cell identifiers are taken from
obs
layer in anndata object; if FALSE, they are taken fromvar
- name
name of the dataset; will be used for new unique IDs of cells#' @param spike_in_col which column in annotation contains information on spike_in counts, which can be used to re-scale counts; mandatory for spike_in scaling factor in simulation
- spike_in_col
which column in annotation contains information on spike_in counts, which can be used to re-scale counts; mandatory for spike_in scaling factor in simulation
- additional_cols
list of column names in annotation, that should be stored as well in dataset object
- filter_genes
boolean, if TRUE, removes all genes with 0 expression over all samples & genes with variance below
variance_cutoff
- variance_cutoff
numeric, is only applied if
filter_genes
is TRUE: removes all genes with variance below the chosen cutoff- type_abundance_cutoff
numeric, remove all cells, whose cell-type appears less then the given value. This removes low abundant cell-types
- scale_tpm
boolean, if TRUE (default) the cells in tpm_matrix will be scaled to sum up to 1e6
Value
Return a SummarizedExperiment object
Examples
# h5 <- system.file("extdata", "anndata.h5ad", package = "SimBu")
# ds_h5ad <- SimBu::dataset_h5ad(
# h5ad_file_counts = h5,
# name = "h5ad_dataset",
# cell_id_col = "id", # this will use the 'id' column of the metadata as cell identifiers
# cell_type_col = "group", # this will use the 'group' column of the metadata as cell type info
# cells_in_obs = TRUE
# ) # in case your cell information is stored in the var layer, switch to FALSE