Skip to contents

Build SummarizedExperiment using a h5ad file for the counts

Usage

dataset_h5ad(
  h5ad_file_counts,
  h5ad_file_tpm = NULL,
  cell_id_col = "ID",
  cell_type_col = "cell_type",
  cells_in_obs = TRUE,
  name = "SimBu_dataset",
  spike_in_col = NULL,
  additional_cols = NULL,
  filter_genes = TRUE,
  variance_cutoff = 0,
  type_abundance_cutoff = 0,
  scale_tpm = TRUE
)

Arguments

h5ad_file_counts

(mandatory) h5ad file with raw count data

h5ad_file_tpm

h5ad file with TPM count data

cell_id_col

(mandatory) name of column in Seurat meta.data with unique cell ids; 0 for rownames

cell_type_col

(mandatory) name of column in Seurat meta.data with cell type name

cells_in_obs

boolean, if TRUE, cell identifiers are taken from obs layer in anndata object; if FALSE, they are taken from var

name

name of the dataset; will be used for new unique IDs of cells#' @param spike_in_col which column in annotation contains information on spike_in counts, which can be used to re-scale counts; mandatory for spike_in scaling factor in simulation

spike_in_col

which column in annotation contains information on spike_in counts, which can be used to re-scale counts; mandatory for spike_in scaling factor in simulation

additional_cols

list of column names in annotation, that should be stored as well in dataset object

filter_genes

boolean, if TRUE, removes all genes with 0 expression over all samples & genes with variance below variance_cutoff

variance_cutoff

numeric, is only applied if filter_genes is TRUE: removes all genes with variance below the chosen cutoff

type_abundance_cutoff

numeric, remove all cells, whose cell-type appears less then the given value. This removes low abundant cell-types

scale_tpm

boolean, if TRUE (default) the cells in tpm_matrix will be scaled to sum up to 1e6

Value

Return a SummarizedExperiment object

Examples

# h5 <- system.file("extdata", "anndata.h5ad", package = "SimBu")
# ds_h5ad <- SimBu::dataset_h5ad(
#  h5ad_file_counts = h5,
#  name = "h5ad_dataset",
#  cell_id_col = "id", # this will use the 'id' column of the metadata as cell identifiers
#  cell_type_col = "group", # this will use the 'group' column of the metadata as cell type info
#  cells_in_obs = TRUE
# ) # in case your cell information is stored in the var layer, switch to FALSE