Tutorial
In this guide, we will walk through how to upload a simple Snakemake workflow to Latch.
The example used here comes from the short tutorial in Snakemake’s documentation.
Before starting, please complete the Prerequisites.
Step 1: Clone the example Snakemake workflow
First, clone the example Snakemake workflow:
git clone git@github.com:latchbio/snakemake-tutorial.git
cd snakemake-tutorial
The workflow generated contains what is typically seen in a Snakemake workflow, such as Python scripts and a Snakefile.
snakemake-tutorial
├── Snakefile
├── config.yaml
├── data
│ └── ...
├── environment.yaml
├── scripts
│ └── plot-quals.py
├── .dockerignore
Step 2: Add Latch Metadata
All Latch workflows require a metadata file specifying the input parameters and metadata the Snakemake workflow needs to run in the Latch Console.
You can automatically generate the required metadata files from an existing config.yaml
by typing:
latch generate-metadata config.yaml
This command will create a latch_metadata
folder in your workflow directory:
snakemake-tutorial
├── Snakefile
├── config.yaml
├── data
│ └── ...
├── environment.yaml
├── latch_metadata
│ └── __init__.py
│ └── parameters.py
├── scripts
│ └── plot-quals.py
├── .dockerignore
Let’s inspect the generated files:
# latch_metadata/__init__.py
from latch.types.metadata import SnakemakeMetadata, LatchAuthor
from latch.types.directory import LatchDir
from .parameters import generated_parameters, file_metadata
SnakemakeMetadata(
output_dir=LatchDir("latch:///your_output_directory"),
display_name="Your Workflow Name",
author=LatchAuthor(
name="Your Name",
),
# Add more parameters
parameters=generated_parameters,
file_metadata=file_metadata,
)
The latch_metadata/__init__.py
file instantiates a SnakemakeMetadata
object, which contains the Latch-specific metadata displayed on the Latch Console when executing a workflow. Feel free to update the output_dir
, display_name
, or author
fields.
The SnakemakeMetadata
object also contains parameters
and file_metadata
fields specifying the workflow’s input parameters.
# latch_metadata/parameters.py
from dataclasses import dataclass
import typing
from latch.types.metadata import SnakemakeParameter, SnakemakeFileParameter, SnakemakeFileMetadata
from latch.types.file import LatchFile
from latch.types.directory import LatchDir
generated_parameters = {
'samples': SnakemakeParameter(
display_name='Samples',
type=LatchDir,
),
'ref_genome': SnakemakeParameter(
display_name='Ref Genome',
type=LatchDir,
),
}
file_metadata = {
'samples': SnakemakeFileMetadata(
path='data/samples/',
config=True,
),
'ref_genome': SnakemakeFileMetadata(
path='genome/',
config=True,
),
}
This file contains two file parameters of type LatchDir
(which is a pointer to a directory hosted on Latch Data). When we register this workflow, these parameters will be exposed to the user on the Latch UI. Upon execution, the workflow orchestrator will download these directories to the local machine before executing the task.
How does the orchestrator know which local path to download the remote files? For each SnakemakeParameter
of type LatchFile
or LatchDir
, we use SnakemakeFileMetadata
object to specify the local path to copy files to before the Snakemake job runs.
Step 3: Define Workflow Environment
To execute Snakemake workflows in a cloud environment, we must define a single Docker container to run each task in. This container must contain both the runtime dependencies for the Snakemake tasks and Latch-specific dependencies (such as the Latch SDK).
Fortunately, the Latch SDK provides a convenient command to generate a Dockerfile with the required Latch dependencies. Run the following in your workflow directory:
latch dockerfile . --snakemake
After running the above command, you should see the following Dockerfile
in your root directory. Let’s analyze each relevant section of the generated Dockerfile
:
### SECTION 1 ###
from 812206152185.dkr.ecr.us-west-2.amazonaws.com/latch-base:fe0b-main
...
run pip install "latch[snakemake]"==<version>
run mkdir /opt/latch
### SECTION 2 ###
run apt-get update --yes && \
apt-get install --yes curl && \
curl \
--location \
--fail \
--remote-name \
https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh && \
bash Mambaforge-Linux-x86_64.sh -b -p /opt/conda -u && \
rm Mambaforge-Linux-x86_64.sh
env PATH=/opt/conda/bin:$PATH
copy environment.yaml /opt/latch/environment.yaml
run mamba env create \
--file /opt/latch/environment.yaml \
--name workflow
env PATH=/opt/conda/envs/workflow/bin:$PATH
### SECTION 3 ###
copy . /root/
### SECTION 4 ###
copy .latch/snakemake_jit_entrypoint.py /root/snakemake_jit_entrypoint.py
-
The Dockerfile uses the Latch base image and installs the Latch SDK with Snakemake support. These steps are required to execute workflows on the Latch cloud.
-
This section installs the runtime dependencies (
bwa
,samtools
, etc.) required for the workflow to execute. Thelatch dockerfile
command will detect the existence of anenvironment.yaml
file in the root directory and create a conda environment from that file. If your workflow doesn’t have anenvironment.yaml
file, you must manually install packages in the Dockerfile. -
Section 3 copies the source code into the container. Use
.dockerignore
to avoid copying any large data files that you do not want in your container. -
Copy the auto-generated Snakemake entry point file into the container. This Python file will be executed when the workflow runs. For now, you don’t need to be familiar with the contents of this file.
Step 4: Upload the Workflow to Latch
Finally, type the following commands to log in to your account and register the workflow to Latch:
latch login
latch register . --snakefile Snakefile
During registration, a workflow image is built and uploaded, and the snakemake_jit_entrypoint.py
file is generated. Once the registration finishes, stdout
provides a link to your workflow on Latch.
Step 5: Upload Data and Run the Workflow
Before running the workflow, we must upload our input data to Latch Data. The skeleton code contains some sample data under the data
/` directory, which you can use for testing.
Once you have uploaded the data and selected the appropriate input parameters, click Launch Workflow
. You should now see the workflow task executing.
Snakemake support currently uses JIT (Just-In-Time) registration. This means that once the single-task workflow above is complete, it will produce a second workflow, which runs the actual Snakemake jobs. To learn more about the lifecycle of a Snakemake workflow on Latch, click here.
Once the workflow finishes running, results will be deposited under the output_dir
folder, as defined in your Latch Metadata.
Next Steps
- Learn about how to modify Snakemake workflows to be cloud-compatible here.
- Visit troubleshooting to diagnose and find solutions to common issues.
- Visit the repository of public examples of Snakemake workflows on Latch.
Was this page helpful?