Snakemake exposes an executor interface that allows developers to write custom plugins to define exactly how jobs are executed. The snakemake-executor-plugin-latch package is a plugin that schedules jobs on machines in Latch’s cloud.

When all inputs for a job are available, the plugin queues the job. A dispatcher server consumes jobs from this queue and, when a suitably sized machine is available, schedules the job on that machine.

Containers

Jobs execute within a container whose image is specified in its rule definition. If no container image is specified, the plugin defaults to the image that the runtime executes in (the same image that is built during latch register).

You can specify the container you want the job to execute in using the container: directive in the rule definition.

For example:

rule use_pandas:
    input:
        storage.latch("latch://123.account/input.txt")
    output:
        storage.latch("latch://123.account/output.txt")
    container:
        "docker://812206152185.dkr.ecr.us-west-2.amazonaws.com/snakemake/pandas:2.2.5"
    script:
        "scripts/use_pandas.py"

You can also specify a default container at the top level of the file:

container: "docker://joseespinosa/docker-r-ggplot2:1.0"

Snakemake executes remote jobs by calling itself again, with appropriate command-line arguments to ensure that the right storage plugin is being used, and that only the target rule is being executed. For this reason, it is vital that all containers have both snakemake and snakemake-storage-plugin-latch installed, and that snakemake is available on $PATH.

You can check that this is the case by running the container locally:

$ docker run -it <your-image> bash
root@container:/# snakemake --version
8.25.3
root@container:/# pip show snakemake-storage-plugin-latch
Name: snakemake-storage-plugin-latch
Version: 0.1.9
...

Conda

It is possible to use the default container image for every rule, and use conda environment files to specify dependencies for each rule. While this practice is encouraged by snakemake, we caution against this for the sole reason of workflow performance. Simple conda environments can take multiple minutes to build, and complex ones can take even longer. Moreover, this time cost will be incurred for every run of the workflow, which can become expensive.

Instead, we strongly encourage you to build container images for each conda environment and use those for your rules instead. By paying the cost of building the conda environments / containers once, you save time during every workflow execution.

Machine Specs

To configure the size and specs of the machine that the job runs on, use the resources: directive:

rule use_pandas:
    input:
        storage.latch("latch://123.account/input.txt")
    output:
        storage.latch("latch://123.account/output.txt")
    container:
        "docker://812206152185.dkr.ecr.us-west-2.amazonaws.com/snakemake/pandas:2.2.5"
    resources:
        cpu = 2,
        mem_gib = 4
    script:
        "scripts/use_pandas.py"

The following resource keys are valid:

  1. cpu / cpus for CPU specification,
  2. mem_{unit} for RAM specification - {unit} can be any metric or binary unit, such as mib or gb,
  3. disk_{unit} for Ephemeral Storage / Disk space specification, with the same rules for {unit} as mem,
  4. gpu / gpus for specifying the number of GPUs desired, and
  5. gpu_type for the type of GPU to use. This field is mandatory if gpu/gpus is greater than 0. See here for valid GPU type/quantity combinations.

For mem and disk, you can also specify resources with the unit after the quantity - for example, the following is equivalent to the previous rule’s resources:

resources:
    cpu = 2,
    mem = "4 GiB"