Overview

On Latch, Snakemake pipelines execute in two stages. These stages, described in detail below, involve (1) DAG generation and (2) actually executing the pipeline.

DAG Generation

Because Snakemake workflows are dynamic and can have different execution graphs depending on their inputs, we have to first figure out what the execution graph looks like for a given set of inputs. This first stage of execution does just that.

First, we stage the input files by downloading them from Latch and moving them to the paths where the Snakemake workflow expects them to be. We then execute the Snakemake workflow in a dry-run capacity to extract the computed execution plan.

This plan specifies which rules get executed on what files, and in which order. From here we can construct a Latch workflow which is one-to-one with the execution plan, namely one task per invocation of a rule. This means that if a rule is invoked multiple times, we generate multiple tasks, one for each time the rule is invoked, and if a rule is never invoked, it isn’t a part of the Latch workflow. The execution plan also details the inputs and outputs of each of the rules it invokes - these thus become the inputs and outputs of the corresponding tasks.

Once the new workflow has been constructed, it is then triggered, allowing us to move into step two.

Execution

The actual execution of each rule is done using a patched version of Snakemake, where we only call the target rule and stage the filesystem exactly as the rule would expect. The patched Snakemake reads metadata from the task environment to determine what rule to execute.