Tutorial

This document aims to be an extension to the Quickstart to help you better understand the structure of a workflow and write your own. Prerequisite:

Complete the Quickstart guide.

What you will learn:

How to write a task and create a workflow

How to define compute and storage requirements

How to manage third-party dependencies

How to customize a user-friendly interface

How to test your workflow from the Latch Console

1. Initialize Workflow Directory

Bootstrap a new workflow directory by running latch init from the command line. In this tutorial, we will be using the covid-wf template.

$ latch init covid-wf --template subprocess
Created a latch workflow in `covid-wf`
Run
        $ latch register covid-wf
To register the workflow with console.latch.bio.

File Tree:

covid-wf
├── LICENSE
├── README.md
├── bowtie2
│   ├── bowtie2
│   └── ...
├── reference
│   ├── wuhan.1.bt2
│   └── ...
├── system-requirements.txt
├── version
└── wf
    ├── __init__.py
    ├── assemble.py
    └── sort.py

Once your boilerplate workflow has been created successfully, you should see a folder called covid-wf.

2. Build your Workflow

Define Individual Tasks

A task is a Python function that:

Takes typed inputs (e.g., LatchFile, LatchDir)
Runs code inside the workflow container
Returns outputs to the Latch platform or to another task

Example from covid-wf/wf/assemble.py: This task ingests two sequencing reads and outputs an assembled SAM file.

from latch.types import LatchFile, LatchOutputDir
from pathlib import Path
import subprocess

@small_task
def assembly_task(
    read1: LatchFile,   # LatchFile refers to remote files stored on Latch Data
    read2: LatchFile,           
    output_directory: LatchOutputDir  # Refers to the output directory for results on Latch Data 
) -> LatchFile:

    # Build the bowtie2 command with local file paths
    bowtie2_cmd = [
        "bowtie2/bowtie2",
        "--local",
        "--very-sensitive-local", 
        "-x", "wuhan",          
        "-1", read1.local_path, # .local_path automatically downloads the file and returns the local path
        "-2", read2.local_path, 
        "-S", "covid_assembly.sam"  
    ]

    # Execute the bowtie2 command
    subprocess.run(bowtie2_cmd, check=True)
    
    local_file = Path("covid_assembly.sam")

    # LatchFile(local_path, remote_path) uploads the local file to Latch
    return LatchFile(local_file, "latch:///covid_assembly.sam")

Considerations when splitting up tasks

When building a workflow with multiple tasks, it can be difficult to decide when to split larger tasks into smaller tasks. Some of the tradeoffs are listed below to guide this decision:Benefits of splitting a task into mulitple smaller tasks:

It is easier to manage the dependencies and environments of tasks with less code
Tasks can be reused between different workflows.
Each task can be assigned different computing resources.
Task functions define clear boundaries between steps in a workflow, allowing for quicker isolation of problems, especially if the tasks are smaller.
It is easier to retry workflows from the last failed task if tasks are small. The last succeeded task will be “further along” in the workflow.
Splitting up tasks creates new nodes in the graph representation of the workflow. If each node has one function, may be easier to interpret for biologists.

Downsides of splitting a task into multiple smaller tasks:

File I/O overhead - files passed between tasks are uploaded to S3 by the first task and then downloaded by the second task. with the appropriate resources to be present before it can run and this can take time.
Scheduling overhead - each task in a workflow waits for an available machine with the appropriate resources to be ready before it begins executing. While this process usually takes under a minute, it can be a significant fraction of the total runtime for fast-running workflows.

Chain Tasks Together into a Workflow

Once tasks are defined, chain them in a workflow function. You can do this by:

Calling each task in sequence
Passing task outputs as inputs to downstream tasks

Example: The workflow calls assembly_task first, then passes its output to sort_bam_task.

@workflow
def assemble_and_sort(read1: LatchFile, read2: LatchFile) -> LatchFile:

    sam = assembly_task(read1=read1, read2=read2)
    return sort_bam_task(sam=sam)

3. Customize compute and storage requirements for each task

A task decorator can be used to specify compute and storage requirements.

from latch import small_task

@small_task # 2 cpus, 4 gigs of memory, 0 gpus
def my_task(
    ...
):
    ...

@large_gpu_task #31 cpus, 120 gigs of memory, 1 gpu
def inference(
    ...
):
    ...

See an exhaustive reference to larger CPU and GPU tasks here.

4. Define dependencies

Latch uses Dockerfiles for dependency management. You can automatically generate Dockerfiles from existing environment files or create them manually.

Automatic Generation

Use the latch dockerfile command to generate a Dockerfile from your existing environment files:

latch dockerfile [OPTIONS] OUTPUT_DIRECTORY

Python Dependencies

# From requirements.txt
latch dockerfile -p requirements.txt .

# From pyproject.toml
latch dockerfile -i pyproject.toml .

System Dependencies

# From apt-requirements.txt
latch dockerfile -a apt-requirements.txt .

R Dependencies

# From environment.R
latch dockerfile -r environment.R .

Conda Dependencies

# From environment.yml
latch dockerfile -c environment.yml .

Injecting Environment Variables

# From .env file
latch dockerfile -d .env .

Manual Creation

You can create Dockerfiles manually following standard Docker best practices. However, certain Latch-specific elements are required for your workflow to function properly. These elements include:

The Latch base image
Command to install the Latch SDK
Latch internal tagging system and expected root directory

Below is a template you can use to get started. Pay attention to how the core commands to install dependencies should live in the middle of the Dockerfile.

# latch base image + dependencies for latch SDK --- removing these will break the workflow
from 812206152185.dkr.ecr.us-west-2.amazonaws.com/latch-base:ace9-main
run pip install latch==2.12.1 # or any other version of the Latch SDK
run mkdir /opt/latch

# install your requirements here

# copy all code from package (use .dockerignore to skip files)
copy . /root/

# set environment variables

# latch internal tagging system + expected root directory --- changing these lines will break the workflow
arg tag
env FLYTE_INTERNAL_IMAGE $tag
workdir /root

5. Customize user interface

There are two pages that you can customize: the About page for your workflow and a Parameters page for workflow input parameters. To modify the About page, simply write your description in Markdown in the docstring of the workflow function.

Latch provides a suite of front-end components out-of-the-box that can be defined by using Python objects LatchMetadata and LatchParameter:

from latch.types import LatchAuthor, LatchDir, LatchFile, LatchMetadata, LatchParameter

...
"""The metadata included here will be injected into your interface."""
metadata = LatchMetadata(
    display_name="Assemble and Sort FastQ Files",
    documentation="your-docs.dev",
    author=LatchAuthor(
        name="John von Neumann",
        email="[email protected]",
        github="github.com/fluid-dynamix",
    ),
    repository="https://github.com/your-repo",
    license="MIT",
    parameters={
        "read1": LatchParameter(
            display_name="Read 1",
            description="Paired-end read 1 file to be assembled.",
            batch_table_column=True,  # Show this parameter in batched mode.
        ),
        "read2": LatchParameter(
            display_name="Read 2",
            description="Paired-end read 2 file to be assembled.",
            batch_table_column=True,  # Show this parameter in batched mode.
        ),
    },
)


...

The metadata variable then needs to be passed into the @workflow decorator to apply the interface to the workflow.

@workflow(metadata)
def assemble_and_sort(read1: LatchFile, read2: LatchFile) -> LatchFile:
    ...

See API documentation on all options to customize the workflow interface here.

6. Add test data for your workflow

Use Latch LaunchPlan to add test data to your workflow.

from latch.resources.launch_plan import LaunchPlan

# Add launch plans at the end of your wf/__init__.py
LaunchPlan(
    assemble_and_sort,
    "Protocol Template 1",
    {
        "read1": LatchFile("s3://latch-public/init/r1.fastq"),
        "read2": LatchFile("s3://latch-public/init/r2.fastq"),
    },
)

LaunchPlan(
    assemble_and_sort,
    "Protocol Template 2",
    {
        "read1": LatchFile("s3://latch-public/init/r1.fastq"),
        "read2": LatchFile("s3://latch-public/init/r2.fastq"),
    },
)

These default values will be available under the ‘Test Data’ dropdown at Latch Console.

7. Register your workflow to Latch

You can release a live version of your workflow by registering it on Latch:

latch register --remote <path_to_workflow_dir>

The registration process will:

Build a Docker image containing your workflow code
Serialize your code and register it with your LatchBio account
Push your docker image to a managed container registry

When registration has completed, you should be able to navigate here and see your new workflow in your account.

8. Test your workflow

To test your first workflow on Console, select the Test Data and click Launch. Statuses of workflows can be monitored under the Executions tab.

9. Iterative Development: Local Testing before Registration

You can test workflows locally during development to catch errors before registering. Use latch develop to build your workflow’s Docker image and start an interactive shell in the same container environment it will run in on Latch. Inside the shell, you can write mock test code and run tasks to verify workflow behavior in a production-like environment.aviour in an environment as close to the production one as possible. See the Development and Debugging to learn more.

What You’ve Learned

Core Concepts:

Tasks are Python functions that process inputs and return outputs
Workflows chain multiple tasks together to create complex pipelines
LatchFile/LatchDir types handle remote file operations automatically

Development Workflow:

Initialize with latch init to create boilerplate code
Build individual tasks, then chain them into workflows
Configure compute resources using task decorators
Manage dependencies with automatic or manual Dockerfile creation
Customize the user interface with metadata and parameters
Test locally with latch develop before registration
Register with latch register --remote to deploy

Next Steps

Customize your workflow interface
Learn about testing and debugging.
Explore advanced workflow features such as caching, retries, and parallelization

Workflows & SDK

Developer Tools

Workflow User Interface

Testing

Latch Console Dashboard

Ready-to-Use Workflows

How to write a task and create a workflow

How to define compute and storage requirements

How to manage third-party dependencies

How to customize a user-friendly interface

How to test your workflow from the Latch Console

1. Initialize Workflow Directory

2. Build your Workflow

Define Individual Tasks

Chain Tasks Together into a Workflow

3. Customize compute and storage requirements for each task

4. Define dependencies

Automatic Generation

Manual Creation

5. Customize user interface

6. Add test data for your workflow

7. Register your workflow to Latch

8. Test your workflow

9. Iterative Development: Local Testing before Registration

What You’ve Learned

Next Steps

Workflows & SDK

Developer Tools

Workflow User Interface

Testing

Latch Console Dashboard

Ready-to-Use Workflows

How to write a task and create a workflow

How to define compute and storage requirements

How to manage third-party dependencies

How to customize a user-friendly interface

How to test your workflow from the Latch Console

​1. Initialize Workflow Directory

​2. Build your Workflow

​Define Individual Tasks

​Chain Tasks Together into a Workflow

​3. Customize compute and storage requirements for each task

​4. Define dependencies

​Automatic Generation

​Manual Creation

​5. Customize user interface

​6. Add test data for your workflow

​7. Register your workflow to Latch

​8. Test your workflow

​9. Iterative Development: Local Testing before Registration

​What You’ve Learned

​Next Steps

1. Initialize Workflow Directory

2. Build your Workflow

Define Individual Tasks

Chain Tasks Together into a Workflow

3. Customize compute and storage requirements for each task

4. Define dependencies

Automatic Generation

Manual Creation

5. Customize user interface

6. Add test data for your workflow

7. Register your workflow to Latch

8. Test your workflow

9. Iterative Development: Local Testing before Registration

What You’ve Learned

Next Steps