sclkme.tl.sketch

sclkme.tl.sketch(adata, n_sketch, use_rep=None, n_pcs=None, random_state=0, method='gs', replace=False, inplace=False, sketch_kwds={}, key_added=None, copy=False)[source]

Cell sketching [Hie et al., 2019] [Baskaran et al., 2022]

sketch is a function that can be used for summarizing the landscape of a large single-cell dataset by selecting a compact subset of cells. This function supports three different methods to perform cell sketching: geoemetric sketching [Hie et al., 2019], kernel herding [Baskaran et al., 2022], and simple random sampling without replacement.

Note

More information and bug reports about the geometric sketching method here.

Parameters
  • adata (AnnData) – Annotated data matrix.

  • n_sketch (int) – number of subsampled cells

  • use_rep (Optional[str]) – Use the indicated representation. ‘X’ or any key for .obsm is valid. If None, the representation is chosen automatically: For .n_vars < 50, .X is used, otherwise ‘X_pca’ is used. If ‘X_pca’ is not present, it’s computed with default parameters.

  • n_pcs (Optional[int]) – Use this many PCs. If n_pcs==0 use .X if use_rep is None

  • random_state (Union[None, int, RandomState]) – A numpy random seed.

  • method (Literal['gs', 'kernel_herding', 'random']) – Use ‘gs’ [Hie et al., 2019], ‘kernel_herding’ [Baskaran et al., 2022] or ‘random’ for for summarizing the landscape of large single-cell datasets.

  • replace (bool) – whether to sample with replacement (default=False)

  • inplace (bool) – It is a boolean which subsample cells in adata itself if True.

  • sketch_kwds (Dict[str, Any]) – A dict that holds other specific parameters used by the sketching methods. (e.g. scaling parameter gamma and dimensionality of the random Fourier frequency features D in the kernel herding method.)

  • key_added (Optional[str]) – If not specified, the subsampled index in boolean format is stored in .obs[‘sketch’], and the sketching parameters are stored in .uns[‘sketch’]. If specified, the subsampled index in boolean format is stored in .obs[key_added+’sketch’], and the sketching parameters are stored in .uns[key_added+’sketch’].

  • copy (bool) – Return a copy instead of writing to adata.

Return type

Optional[AnnData]

Returns

  • Depending on copy, updates or returns adata with the following

  • See key_added parameter description for the storage path of subsampled indices.

Example

>>> import scanpy as sc
>>> import sclkme

Load annotated dataset:

>>> adata = sc.datasets.pbmc3k_processed()

Run the cell sketching using geometric sketching:

>>> sclkme.tl.sketch(adata, n_sketch=128, use_rep="X_pca", method="gs", key_added = "gs")

Run the cell sketching using kernel herding:

>>> sclkme.tl.sketch(adata, n_sketch=128, use_rep="X_pca", method="kernel_herding", key_added = "kh")

Visualize the sketched dataset:

>>> adata_sketch_gs = adata[adata.obs['gs_sketch']]
>>> sc.pl.umap(adata_sketch_gs, color="louvain", size=100)
>>> adata_sketch_kh = adata[adata.obs['kh_sketch']]
>>> sc.pl.umap(adata_sketch_kh, color="louvain", size=100)