Using the Latch Executor Plugin
Snakemake exposes an executor interface that allows developers to write custom plugins to define exactly how jobs are executed. The snakemake-executor-plugin-latch
package is a plugin that schedules jobs on machines in Latch’s cloud.
When all inputs for a job are available, the plugin queues the job. A dispatcher server consumes jobs from this queue and, when a suitably sized machine is available, schedules the job on that machine.
Containers
Jobs execute within a container whose image is specified in its rule definition. If no container image is specified, the plugin defaults to the image that the runtime executes in (the same image that is built during latch register
).
You can specify the container you want the job to execute in using the container:
directive in the rule definition.
For example:
You can also specify a default container at the top level of the file:
Snakemake executes remote jobs by calling itself again, with appropriate command-line arguments to ensure that the right storage plugin is being used, and that only the target rule is being executed. For this reason, it is vital that all containers have both snakemake
and snakemake-storage-plugin-latch
installed, and that snakemake
is available on $PATH
.
You can check that this is the case by running the container locally:
Conda
It is possible to use the default container image for every rule, and use conda environment files to specify dependencies for each rule. While this practice is encouraged by snakemake, we caution against this for the sole reason of workflow performance. Simple conda environments can take multiple minutes to build, and complex ones can take even longer. Moreover, this time cost will be incurred for every run of the workflow, which can become expensive.
Instead, we strongly encourage you to build container images for each conda environment and use those for your rules instead. By paying the cost of building the conda environments / containers once, you save time during every workflow execution.
Machine Specs
To configure the size and specs of the machine that the job runs on, use the resources:
directive:
The following resource keys are valid:
cpu
/cpus
for CPU specification,mem_{unit}
for RAM specification -{unit}
can be any metric or binary unit, such asmib
orgb
,disk_{unit}
for Ephemeral Storage / Disk space specification, with the same rules for{unit}
asmem
,gpu
/gpus
for specifying the number of GPUs desired, andgpu_type
for the type of GPU to use. This field is mandatory ifgpu
/gpus
is greater than 0. See here for valid GPU type/quantity combinations.
For mem
and disk
, you can also specify resources with the unit after the quantity - for example, the following is equivalent to the previous rule’s resources:
Was this page helpful?