General Notes

The count matrix view shows general statistics about the loaded AnnData file. The number of cells and genes are computed from the unique indices of adata.obs and adata.var_names respectively. # of Samples are computed from an adata.obs.samples column if available in the file.

The AnnData file structure can be broken down into these major sections, as explained in details in the original AnnData documentation here:

At the moment, the visualizer is not fully up to spec with the AnnData file structure, and only supports direct interactions with a limited number of these sections:

  • The Observations tab maps to the adata.obs, and the annotations on the sidebar are loaded for adata.obs as well. Any annotations that are added or edited are also added to adata.obs. Cell IDs are also loaded from the index of adata.obs.

  • The annotations on the sidebar can be used to color the UMAP, tSNE, or PCA embeddings.

  • The Variables tab maps to adata.var and the index of which (adata.var_names) are used as Gene Names by Genes of Interest and Differential Expression.

  • The gene names under Genes of Interest are extracted from the gene_id column of the Variables page.

  • The level of gene expression can be highlighted on the embedding by selecting one of the genes of interest.

  • Embeddings are saved in adata.obsp and displayed in the visualizer.
  • All mutations are performed directly on the default .X matrix in the AnnData file. This was done to be in spec with Latch’s Single Cell Pipeline. At the moment we do not support editing/replacing layers in the visualizer. Instead, each AnnData file is immutable, any operations that strictly mutates the underlying counts/.X matrix create a new node (.h5ad file)

Mutations

The Single Cell Visualizer contains a series of mutations that can be run on each AnnData file. The frontend passes the selected parameters to a scanpy function on the backend, which subsequently runs the mutation.

An example of how this looks:

is translated to the backend as:

scanpy.tl.pca(n_comps = 50, svd_solver = “arpack”)

Here, the two exposed parameters are Number of PCs to compute and SVD solver to use, which map to the n_comps

and svd_solver parameters of scanpy.tl.pca. Note that if there are no exposed parameters for a mutation on Pollock, default parameters from scanpy are used. To see an exhaustive list of default values for scanpy functions, visit Scanpy API reference here.

A list of mutation names on Pollock and underlying Scanpy functions is provided below.

Mutation Type on PollockMutation Name on PollockUnderlying Scanpy Function
Cell QC/ FilteringCountsscanpy.pp.filter_cells
Cell QC/ FilteringDetected Genesscanpy.pp.filter_genes
Cell QC/ FilteringMitochondrial Countsscanpy.pp.calculate_qc_metrics (for genes detected with prefix MT-)
Cell QC/ Filtering% Ribosomal Countsscanpy.pp.calculate_qc_metrics (for genes detected with prefix either RPS or RPL)
NormalizationCPM Normalizationscanpy.pp.normalize_total
Log TransformLog Transformscanpy.pp.log1p
Batch CorrectionScanpyscanpy.pp.combat
Batch CorrectionHarmonyscanpy.external.pp.harmony_integrate
PCA (Inplace)PCAscanpy.tl.pca
TSNE (Inplace)TSNEscanpy.tl.tsne
UMAP (Inplace)UMAPscanpy.tl.umap
Neighbors (inplace)Neighborsscanpy.pp.neighbors
Differential Expression (Inplace)Differential Expression Reportscanpy.tl.rank_genes_groups
SubclusteringSubclusteringAnnData filtering, ex: adata = adata.loc[adata.obs[cell_type] == “t-cell”]
ClusteringLeidenscanpy.tl.leiden
Differential Expression (Inplace)Louvainscanpy.tl.louvain

There are a few exceptions to the format above - notably filter_cells and filter_genes don’t allow for concurrent filtering of cells and genes. In these cases, the functions are run with min_cells and min_genes respectively before being run again with max_cells and max_genes respectively based on the range provided via the plot.

Was this page helpful?