Registering an Example Workflow
The goal of this tutorial is to provide an example of the steps required to get an existing Snakemake workflow running on Latch.
To get started, first clone the starter repository:
and ensure that you have latch
installed:
1. Updating the Pipeline to use Latch Storage
For now we will not need to make any edits to the Snakefile to make it work with Latch Storage. We will revisit this later in the tutorial however.
2. Adding Resources + Containers to each Rule
The first actual step we need to do is to specify how large each job’s machine needs to be. Since this is a relatively low footprint pipeline, we can make each machine small and provide 1 core and 2 GiB of RAM.
Because every rule has the same resource requirments, we can use a profile to specify them all at once, instead of having to update every rule individually.
Create a directory called profiles/default
and in it touch a file called config.yaml
:
Then, add the following YAML content to the config.yaml
:
This will set the resources for every rule. Note that you can override these for any rule by updating the resources of that rule directly.
We will skip creating containers for each rule; since all rules use the same conda environment, we can just install that environment in the docker image we make during latch register
and have each rule run in that container instead.
3. Writing Metadata
Now, we need to write a metadata file that our workflow will use to generate its parameter interface.
First, make a directory called latch_metadata
and in it touch a file called __init__.py
:
In latch_metadata/__init__.py
, create a SnakemakeV2Metadata
object as below:
This object still doesn’t have any parameter metadata yet, so we need to add it. Looking at config.yaml
(not the file in profiles/default
), we see that the pipeline expects 3 config parameters: samples_dir
, genome_dir
, and results_dir
. The former two are inputs to the pipeline and the latter is the location where outputs will be stored.
We want all three of these to be exposed in the UI, so we will add them to the parameters
dict in latch_metadata/__init__.py
:
In each parameter, we specified (1) a human-readable name to display in the UI, and (2) the type of parameter to accept. Since the workflow expects all of these to be directories, they are all LatchDir
s (we made results_dir
a LatchOutputDir
because it is an output directory).
For now, this is all we need and we can move on, but if you like feel free to customize the metadata object further using the interface described here.
4. Generating the Entrypoint
Now we need to generate the entrypoint file containing the Latch workflow wrapping our Snakemake workflow. This is a simple command:
This should create a directory called wf
containing a file called entrypoint.py
. The file should have the following contents:
5. Generating the Dockerfile
The last step pre-registering is to generate the Dockerfile
that will define the environment the runtime executes in. In particular, we want that environment to contain the conda environment defined by environment.yaml
.
Again, we can accomplish this with a simple command:
This will generate a file called Dockerfile
with the following contents:
6. Registering and Running your Pipeline
Finally, we get to upload our pipeline to Latch. Simply run
To run on Latch, you will also need to upload the test data. This is straightforward using latch cp
:
This will upload the data to a folder called snakemake-tutorial-data
in your account on Latch.
Finally, navigate to Workflows and click on “Snakemake Tutorial Workflow”, select the parameters from the data you just uploaded, and run the workflow!
Appendix 1. Getting Sample Names Dynamically
You may have noticed that in the Snakefile, the sample names are hardcoded. This is obviously not desirable - we should be able to infer the sample names based on the contents of the Sample directory.
In order to accomplish this, we will need to edit both the Snakefile, and the entrypoint itself. Since we need to know the contents of the Sample directory outside of a rule, we will need to stage it locally before the pipeline executes.
First, add the following import to the top of the wf/entrypoint.py
file:
Next, edit the start of snakemake_runtime(...)
so that it is the following:
Here we explicitly download the samples_dir
before calling snakemake
- this way we will know the contents of the directory without needing to be in a rule.
Lastly, we will need to edit the Snakefile and remove the hardcoded samples:
Now just re-register and see all 3 samples be run through the pipeline.
Was this page helpful?