Publishing Datasets
After completing the curation pipeline, the publish commands help you build metadata, upload datasets to Latch Data, and notify paper authors.Prerequisites
Before publishing, ensure you have:- Completed the curation pipeline through
harmonize-metadata - Configuration files in
~/.latch/latch-curate/:metadata_schema.yaml- Metadata harmonization schemacell_typing_schema.yaml- Cell typing vocabulary
- Latch credentials in
~/.latch/:token- Your Latch SDK tokenworkspace- Workspace ID (JSON or plaintext)
Setting Up Credentials
Setting Up Configuration Files
Publish Workflow
Step 1: Build
Generate metadata and validate the curated dataset.- Extracts paper title and abstract via API
- Retrieves corresponding author contact information
- Validates harmonized metadata against your schema
- Validates cell typing against configured vocabulary
- Extracts ontology tags (disease, tissue, assay, cell types)
- Generates
publish/build.yamlwith all metadata
download/paper_text.txt- Paper text or abstractdownload/paper_url.txt- URL to the paperdownload/external_id.txt- GEO accession IDharmonize_metadata/harmonize_metadata.h5ad- Curated AnnData
publish/build.yaml- Build metadata filepublish/publish.h5ad- Final curated object
Step 2: Upload
Upload the dataset to Latch Data and register it in the data portal.- Destination path: Where to store the dataset in Latch Data (e.g.,
latch:///datasets/) - Curator organization ID: Your organization’s ID in the system
- Dataset version: Version string (e.g.,
v1.0.0) - Curator dataset ID: Unique identifier for this dataset (defaults to GEO ID)
- Uploads
publish/directory to Latch Data - Retrieves the ldata node ID for the uploaded files
- Registers the dataset with the data portal API
- Returns family ID and dataset ID on success
Step 3: Email (Optional)
Send notification emails to paper authors about the curated dataset.- Email configuration at
~/.latch/latch-curate/email-info.json:
Troubleshooting
Missing configuration files
~/.latch/latch-curate/. See Configuration Reference for schema formats.
Missing pipeline files
Cell typing validation failed
~/.latch/latch-curate/cell_typing_schema.yaml:
Token not found
latch login or manually create the token file: