Build SummarizedExperiment using a h5ad file for the counts

Usage

dataset_h5ad(
  h5ad_file_counts,
  h5ad_file_tpm = NULL,
  cell_id_col = "ID",
  cell_type_col = "cell_type",
  cells_in_obs = TRUE,
  name = "SimBu_dataset",
  spike_in_col = NULL,
  additional_cols = NULL,
  filter_genes = TRUE,
  variance_cutoff = 0,
  type_abundance_cutoff = 0,
  scale_tpm = TRUE
)

Arguments

h5ad_file_counts: (mandatory) h5ad file with raw count data
h5ad_file_tpm: h5ad file with TPM count data
cell_id_col: (mandatory) name of column in Seurat meta.data with unique cell ids; 0 for rownames
cell_type_col: (mandatory) name of column in Seurat meta.data with cell type name
cells_in_obs: boolean, if TRUE, cell identifiers are taken from obs layer in anndata object; if FALSE, they are taken from var
name: name of the dataset; will be used for new unique IDs of cells#' @param spike_in_col which column in annotation contains information on spike_in counts, which can be used to re-scale counts; mandatory for spike_in scaling factor in simulation
spike_in_col: which column in annotation contains information on spike_in counts, which can be used to re-scale counts; mandatory for spike_in scaling factor in simulation
additional_cols: list of column names in annotation, that should be stored as well in dataset object
filter_genes: boolean, if TRUE, removes all genes with 0 expression over all samples & genes with variance below variance_cutoff
variance_cutoff: numeric, is only applied if filter_genes is TRUE: removes all genes with variance below the chosen cutoff
type_abundance_cutoff: numeric, remove all cells, whose cell-type appears less then the given value. This removes low abundant cell-types
scale_tpm: boolean, if TRUE (default) the cells in tpm_matrix will be scaled to sum up to 1e6

Value

Return a SummarizedExperiment object

Examples

# h5 <- system.file("extdata", "anndata.h5ad", package = "SimBu")
# ds_h5ad <- SimBu::dataset_h5ad(
#  h5ad_file_counts = h5,
#  name = "h5ad_dataset",
#  cell_id_col = "id", # this will use the 'id' column of the metadata as cell identifiers
#  cell_type_col = "group", # this will use the 'group' column of the metadata as cell type info
#  cells_in_obs = TRUE
# ) # in case your cell information is stored in the var layer, switch to FALSE