Engineering Principles
Latch Curate is built on carefully designed engineering principles that enable effective collaboration between language models and human curators. These principles were developed through manually curating ten million cells spanning roughly 200 datasets and covering more than 80 autoimmune indications.LLM Engineering Principles
1. End-to-End Reasoning
As the performance of frontier models continues to improve, we hypothesize that curation systems built around end-to-end reasoning will scale more effectively than architectures that rigidly partition function and order among multiple sub-agents. Whenever possible, latch-curate embeds task context, control-flow decisions, and tool selection within a single model call rather than orchestrating an array of specialised models with fixed interaction patterns.2. Precise Validation Criteria
We define precise validation criteria to capture edge cases, especially in agentic loops where test results provide the only feedback signal. Each criterion is split into:- A natural-language description, which guides the agent
- A code assertion, which formally verifies the output and provides clear error logs
3. Domain Knowledge as Prompts and Tools
To minimise novel reasoning per task, domain knowledge is pulled into prompts and reusable tool libraries. This focuses the model on genuine task variation, boosting accuracy while reducing runtime and cost. Tools are developed both by:- Hand-coding utilities during manual cleaning
- Mining logs from earlier agentic runs to find recurring operations
4. Output Integration
To integrate model outputs with conventional software, the model:- Writes driver scripts to canonical paths
- Emits JSON data that conform to fixed schemas
5. Chain-of-Thought Traces
Requesting explicit chain-of-thought traces consistently improves reasoning accuracy and provides curators with an introspectable record of the model’s logic. These traces are embedded in the output JSON and surfaced in validation reports.Curation Principles
1. Understanding the Assignment
Most of the engineering effort for this system went into deeply understanding the curation task and encoding that domain knowledge into prompts, tool libraries, and tests, rather than traditional software development. We manually curated ten million cells spanning roughly 200 datasets and covering more than 80 autoimmune indications to learn which parts of the problem were conserved and which truly varied. For several months, we delivered data weekly to a biotech company developing autoimmune therapies, incorporating rapid feedback from domain experts to refine the process. As the curated volume grew, our prompts, tools, and tests became more robust with exposure to diverse:- Sequencing technologies
- File formats
- Supplemental structures
- Study designs
- Downstream analytical needs
2. Ontology-Driven Variables
Where possible, we relied on well-maintained ontologies with strong scientific backing to populate key variables:- MONDO for
latch_disease
- CL for
latch_cell_type_lvl_1
- UBERON for
latch_tissue
- ETF for
latch_sequencing_platform
3. Validation Artifacts
Creating concise validation artifacts—reports with before-and-after plots that give curators just enough information to make decisions—proved challenging. Running large, diverse datasets through the system and iterating with domain experts revealed which plots and metrics mattered most.4. Parallel Agentic Workflows
Human-in-the-loop efficiency scales when curators can juggle many agentic workflows simultaneously. A single task, such as count-matrix construction, may take 5–30 minutes before it needs human validation. Throughput peaks when enough concurrent runs keep the validation queue full. Ongoing work aims to streamline curator triage of agentic runs and to boost throughput by dispatching containerised tasks to workflow-orchestration software.Technical Implementation
Storage Standard
We adopted the Scanpy ecosystem and AnnData objects as our storage standard. Their Python-native design and widespread community support let us reuse tool libraries across agentic tasks and kept model-generated code readable.Version Control
Each task outputs assets - driver scripts, JSON files, agent logs, and reports - into directories that can be uploaded to version-controlled blob stores. Because the agentic workflow runs inside a versioned container with input data mounted to a sandboxed file system at well-defined locations, rerunning these workflows with modified information is straightforward.Reproducibility
Curated datasets are living assets, and new computational tools or updated scientific knowledge often require re-processing previously curated objects. The framework maintains complete reproducibility through:- Versioned containers
- Fixed input/output paths
- Comprehensive logging
- Parameter files for each processing step