sclkme.tl.sketch
- sclkme.tl.sketch(adata, n_sketch, use_rep=None, n_pcs=None, random_state=0, method='gs', replace=False, inplace=False, sketch_kwds={}, key_added=None, copy=False)[source]
Cell sketching [Hie et al., 2019] [Baskaran et al., 2022]
sketch is a function that can be used for summarizing the landscape of a large single-cell dataset by selecting a compact subset of cells. This function supports three different methods to perform cell sketching: geoemetric sketching [Hie et al., 2019], kernel herding [Baskaran et al., 2022], and simple random sampling without replacement.
Note
More information and bug reports about the geometric sketching method here.
- Parameters
adata (
AnnData) – Annotated data matrix.n_sketch (
int) – number of subsampled cellsuse_rep (
Optional[str]) – Use the indicated representation. ‘X’ or any key for .obsm is valid. If None, the representation is chosen automatically: For .n_vars < 50, .X is used, otherwise ‘X_pca’ is used. If ‘X_pca’ is not present, it’s computed with default parameters.n_pcs (
Optional[int]) – Use this many PCs. If n_pcs==0 use .X if use_rep is Nonerandom_state (
Union[None,int,RandomState]) – A numpy random seed.method (
Literal['gs','kernel_herding','random']) – Use ‘gs’ [Hie et al., 2019], ‘kernel_herding’ [Baskaran et al., 2022] or ‘random’ for for summarizing the landscape of large single-cell datasets.replace (
bool) – whether to sample with replacement (default=False)inplace (
bool) – It is a boolean which subsample cells in adata itself if True.sketch_kwds (
Dict[str,Any]) – A dict that holds other specific parameters used by the sketching methods. (e.g. scaling parameter gamma and dimensionality of the random Fourier frequency features D in the kernel herding method.)key_added (
Optional[str]) – If not specified, the subsampled index in boolean format is stored in .obs[‘sketch’], and the sketching parameters are stored in .uns[‘sketch’]. If specified, the subsampled index in boolean format is stored in .obs[key_added+’sketch’], and the sketching parameters are stored in .uns[key_added+’sketch’].copy (
bool) – Return a copy instead of writing to adata.
- Return type
Optional[AnnData]- Returns
Depending on copy, updates or returns adata with the following
See key_added parameter description for the storage path of subsampled indices.
Example
>>> import scanpy as sc >>> import sclkme
Load annotated dataset:
>>> adata = sc.datasets.pbmc3k_processed()
Run the cell sketching using geometric sketching:
>>> sclkme.tl.sketch(adata, n_sketch=128, use_rep="X_pca", method="gs", key_added = "gs")
Run the cell sketching using kernel herding:
>>> sclkme.tl.sketch(adata, n_sketch=128, use_rep="X_pca", method="kernel_herding", key_added = "kh")
Visualize the sketched dataset:
>>> adata_sketch_gs = adata[adata.obs['gs_sketch']] >>> sc.pl.umap(adata_sketch_gs, color="louvain", size=100) >>> adata_sketch_kh = adata[adata.obs['kh_sketch']] >>> sc.pl.umap(adata_sketch_kh, color="louvain", size=100)