Documentation Index
Fetch the complete documentation index at: https://wiki.latch.bio/llms.txt
Use this file to discover all available pages before exploring further.
Publishing Datasets
After completing the curation pipeline, the publish commands help you build metadata, upload datasets to Latch Data, and notify paper authors.
Prerequisites
Before publishing, ensure you have:
- Completed the curation pipeline through
harmonize-metadata
- Configuration files in
~/.latch/latch-curate/:
metadata_schema.yaml - Metadata harmonization schema
cell_typing_schema.yaml - Cell typing vocabulary
- Latch credentials in
~/.latch/:
token - Your Latch SDK token
workspace - Workspace ID (JSON or plaintext)
Setting Up Credentials
mkdir -p ~/.latch
# Token is automatically created when you run `latch login`
# Or manually create it:
echo "your-sdk-token" > ~/.latch/token
# Workspace ID (get from Latch Console settings)
echo "your-workspace-id" > ~/.latch/workspace
Setting Up Configuration Files
mkdir -p ~/.latch/latch-curate
# Copy the cell typing schema from the repo
cp cell_typing_schema.yaml ~/.latch/latch-curate/
# Create metadata schema (see Configuration Reference for format)
cat > ~/.latch/latch-curate/metadata_schema.yaml << 'EOF'
variables:
- name: "disease"
description: "disease or condition studied"
vocab:
type: "ontology"
name: "mondo"
- name: "tissue"
description: "tissue or anatomical site"
vocab:
type: "ontology"
name: "uberon"
- name: "assay"
description: "sequencing assay used"
vocab:
type: "ontology"
name: "efo"
- name: "sample_site"
description: "sample collection site"
vocab:
type: "custom"
values: ["tumor", "normal", "metastasis", "blood"]
EOF
Publish Workflow
Step 1: Build
Generate metadata and validate the curated dataset.
latch-curate publish build
This command:
- Extracts paper title and abstract via API
- Retrieves corresponding author contact information
- Validates harmonized metadata against your schema
- Validates cell typing against configured vocabulary
- Extracts ontology tags (disease, tissue, assay, cell types)
- Generates
publish/build.yaml with all metadata
Required files:
download/paper_text.txt - Paper text or abstract
download/paper_url.txt - URL to the paper
download/external_id.txt - GEO accession ID
harmonize_metadata/harmonize_metadata.h5ad - Curated AnnData
Outputs:
publish/build.yaml - Build metadata file
publish/publish.h5ad - Final curated object
Example output:
Build complete! Please verify the following information:
============================================================
Paper Title: Single-cell analysis of human tissues
Paper Abstract: We performed single-cell RNA sequencing...
Cell Count: 45,231
Authors: Smith J, Jones A
Email Contacts: smith@university.edu
Metadata Validation Status: passed
Metadata Tags Extracted: 4
Cell Typing Validation Status: passed
Cell Typing Tags Extracted: 8
All Tags:
- disease: Alzheimer's disease
- tissue: brain
- assay: 10x 3' v3
- cell_type: neuron
- cell_type: astrocyte
... and 6 more
============================================================
Step 2: Upload
Upload the dataset to Latch Data and register it in the data portal.
latch-curate publish upload
You will be prompted for:
- Destination path: Where to store the dataset in Latch Data (e.g.,
latch:///datasets/)
- Curator organization ID: Your organization’s ID in the system
- Dataset version: Version string (e.g.,
v1.0.0)
- Curator dataset ID: Unique identifier for this dataset (defaults to GEO ID)
Or provide options directly:
latch-curate publish upload \
--latch-dest "latch:///curated-datasets/" \
--curator-id 123 \
--version "v1.0.0" \
--curator-dataset-id "GSE252545"
What happens:
- Uploads
publish/ directory to Latch Data
- Retrieves the ldata node ID for the uploaded files
- Registers the dataset with the data portal API
- Returns family ID and dataset ID on success
Example output:
Dataset Upload
Paper Title: Single-cell analysis of human tissues
Cell Count: 45,231
Validation Status: passed
Tags: 12 extracted
Uploading dataset...
Curator ID: 123
Version: v1.0.0
Dataset ID: GSE252545
Retrieved node ID 456789 for latch:///curated-datasets/GSE252545
Uploading dataset to active workspace 456789
Upload complete!
Family ID: 100
Dataset ID: 200
Step 3: Email (Optional)
Send notification emails to paper authors about the curated dataset.
latch-curate publish email
Prerequisites:
- Email configuration at
~/.latch/latch-curate/email-info.json:
{
"smtp_host": "smtp.example.com",
"smtp_port": 587,
"smtp_user": "your-email@example.com",
"smtp_password": "your-password",
"sender_addr": "curation@latch.bio",
"starttls": true,
"timeout": 30
}
Troubleshooting
Missing configuration files
AssertionError (metadata_schema_path or cell_typing_config_path)
Ensure configuration files exist at ~/.latch/latch-curate/. See Configuration Reference for schema formats.
Missing pipeline files
AssertionError (paper_url_file, paper_text_file, etc.)
Run the full curation pipeline first, or create the required files manually:
mkdir -p download harmonize_metadata
echo "https://example.com/paper" > download/paper_url.txt
echo "Paper text here..." > download/paper_text.txt
echo "GSE12345" > download/external_id.txt
Cell typing validation failed
Cell typing validation failed: ["Cell type 'unknown' not in configured vocabulary"]
Add the missing cell type to ~/.latch/latch-curate/cell_typing_schema.yaml:
vocabulary:
- name: "unknown"
ontology_id: ""
# ... other entries
Token not found
ValueError: SDK token does not exist
Run latch login or manually create the token file:
echo "your-token" > ~/.latch/token
AssertionError (workspace_data_path)
Create the workspace file:
echo "your-workspace-id" > ~/.latch/workspace
build.yaml Reference
The build file contains all metadata for the dataset:
info:
description: "Paper abstract text..."
paper_title: "Single-cell analysis..."
cell_count: 45231
paper_url: "https://doi.org/..."
data_url: "https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE252545"
data_external_id: "GSE252545"
corresponding_author_names:
- "John Smith"
- "Jane Doe"
corresponding_author_emails:
- "smith@university.edu"
- "doe@institute.org"
validation:
metadata_validation_status: "passed"
metadata_schema_used: "/root/.latch/latch-curate/metadata_schema.yaml"
metadata_tags_extracted: 4
cell_typing_validation_status: "passed"
cell_typing_config_used: "/root/.latch/latch-curate/cell_typing_schema.yaml"
cell_typing_tags_extracted: 8
tags:
- metadata_type: "disease"
value: "Alzheimer's disease"
ontology_id: "MONDO:0004975"
- metadata_type: "tissue"
value: "brain"
ontology_id: "UBERON:0000955"
- metadata_type: "cell_type"
value: "neuron"
ontology_id: "CL:0000540"
curator:
curator_id: 123
version: "v1.0.0"
curator_dataset_id: "GSE252545"
upload_timestamp: "2024-01-15T10:30:00"
ldata_node_id: 456789