Configuration Reference
Latch Curate uses two YAML configuration files to customize cell typing and metadata harmonization workflows. These files allow you to define custom vocabularies, ontologies, and validation rules for your datasets.cell_typing_schema.yaml
The cell typing configuration defines the vocabulary and marker genes used for automated cell type annotation. Location:~/.latch/latch-curate/cell_typing_schema.yaml
Configuration Fields
cell_type_column
- Type:
string - Required: Yes
- Description: Column name where cell type annotations will be stored in
AnnData.obs - Default:
"latch_cell_type_lvl_1"
cluster_column
- Type:
string - Required: Yes
- Description: Name of the clustering column in
AnnData.obsto use for cell typing - Default:
"leiden_res_0.50" - Validation: Must exist in
AnnData.obs
vocabulary
- Type:
list[object] - Required: Yes
- Description: List of allowed cell types with Cell Ontology (CL) identifiers
- name (
string): Human-readable cell type name - ontology_id (
string): Cell Ontology ID in format"CL:XXXXXXX"
marker_genes
- Type:
dict[string, list[string]] - Required: Yes
- Description: Mapping of cell type groups to lists of marker gene symbols
- Keys: Cell type group names (can differ from vocabulary names)
- Values: Lists of gene symbols
- Validation: Warnings if genes are not found in
AnnData.var['gene_symbols']
Example Configuration
Validation Rules
- The
cluster_columnmust exist in the AnnData object - Cell types in the data should match vocabulary names or use format
"name/ontology_id" - Ontology IDs must follow Cell Ontology format (
CL:XXXXXXX) - Missing marker genes generate warnings but don’t fail validation
Usage in Pipeline
The cell typing schema is used by:latch-curate type-cells- Main cell typing workflowlatch-curate publish build- Validation during publication
metadata_schema.yaml
The metadata schema defines harmonized metadata variables that should be extracted and validated against controlled vocabularies or ontologies. Location:~/.latch/latch-curate/metadata_schema.yaml
Configuration Fields
variables
- Type:
list[object] - Required: Yes
- Description: List of metadata variable definitions
name
- Type:
string - Required: Yes
- Description: Column name to create in
AnnData.obs - Convention: Prefix with
latch_(e.g.,"latch_disease","latch_tissue")
description
- Type:
string - Required: Yes
- Description: Natural language description of what the variable represents
- Usage: Used by LLM to understand what metadata to extract
vocab
- Type:
object - Required: Yes
- Description: Vocabulary specification defining allowed values
vocab object contains:
vocab.type
- Type:
string - Required: Yes
- Allowed Values:
"uncontrolled"- Free text, no validation"ontology"- Must match terms from a specific ontology"custom"- Must match predefined list of values
vocab.name
- Type:
string - Required: Required when
type: "ontology" - Allowed Values:
"mondo"- Disease ontology"uberon"- Tissue/anatomy ontology"cl"- Cell type ontology"efo"- Experimental Factor Ontology (sequencing platforms)
vocab.values
- Type:
list[string] - Required: Required when
type: "custom" - Description: List of allowed values for custom vocabularies
Example Configuration
Validation Rules
- Ontology terms must be in format:
"name/ONTOLOGY_ID"(e.g.,"systemic sclerosis/MONDO:0005100") - Custom vocabulary values must exactly match one of the allowed values (case-sensitive)
- Uncontrolled fields cannot be empty
- All variables defined in the schema will be created as columns in the AnnData object
Output Format
The harmonization process creates: File:harmonize_metadata/harmonize_metadata_metadata.yaml
latch-curate harmonize-metadata run --use-metadata.
Usage in Pipeline
The metadata schema is used by:latch-curate harmonize-metadata run- LLM-based metadata extractionlatch-curate publish build- Tag extraction and validationlatch-curate lint- Metadata validation
Using with External Data
The harmonize-metadata command can work with any AnnData file using the--adata-path flag:
- AnnData object must have
obs['latch_sample_id']column with sample identifiers - The
download/folder must exist withstudy_metadata.txtandpaper_text.txtfiles - Metadata schema must be configured at
~/.latch/latch-curate/metadata_schema.yaml