In order to run on Latch, each rule needs to have both a container image to run in, and a resource specification for the pod the container runs on.

Container Images

For each rule, you can specify a container image using the container: directive. For example:

rule use_pandas:
    input:
        storage.latch("latch://123.account/input.txt")
    output:
        storage.latch("latch://123.account/output.txt")
    container:
        "docker://812206152185.dkr.ecr.us-west-2.amazonaws.com/snakemake/pandas:2.2.5"
    script:
        "scripts/use_pandas.py"

You can also specify a default container at the top level of the file:

container: "docker://joseespinosa/docker-r-ggplot2:1.0"

Due to a limitation of Snakemake, all containers must have Snakemake installed, as well as snakemake_storage_plugin_latch. The command snakemake must also be available in $PATH.

These can both be installed using pip:

pip install snakemake snakemake_storage_plugin_latch

You can check whether or not snakemake is available in $PATH by running a container locally:

$ docker run -it <your-image> bash
root@container:/# snakemake --version
8.25.3

If the command succeeds and a version is printed, you are good to go.

Using Conda

It is possible to use a single container image for every rule, and use conda environment files to specify dependencies for each rule. While this practice is encouraged by snakemake, we caution against this for the sole reason of latency. Simple conda environments can take multiple minutes to build, and complex ones can take even longer. Moreover, this time cost will be incurred for every run of the workflow, which can become expensive.

Instead, we strongly encourage you to build container images for each conda environment and use those for your rules instead. By paying the cost of building the conda environments / containers once, you save time during every workflow execution.

Resources

Every rule needs to also have resources set so that it can run on the appropriate machine. You can use the resources directive for this:

rule use_pandas:
    input:
        storage.latch("latch://123.account/input.txt")
    output:
        storage.latch("latch://123.account/output.txt")
    container:
        "docker://812206152185.dkr.ecr.us-west-2.amazonaws.com/snakemake/pandas:2.2.5"
    resources:
        cpu = 2,
        mem_gib = 4
    script:
        "scripts/use_pandas.py"

The following resource keys are valid:

  1. cpu / cpus for CPU specification,
  2. mem_{unit} for RAM specification - {unit} can be any metric or binary unit, such as mib or gb,
  3. disk_{unit} for Ephemeral Storage / Disk space specification, with the same rules for {unit} as mem,
  4. gpu / gpus for specifying the number of GPUs desired, and
  5. gpu_type for the type of GPU to use. This field is mandatory if gpu/gpus is greater than 0. See here for valid GPU type/quantity combinations.