> ## Documentation Index
> Fetch the complete documentation index at: https://wiki.latch.bio/llms.txt
> Use this file to discover all available pages before exploring further.

# Using Latch Storage

Snakemake exposes a storage interface that allows developers to write custom plugins to enable reading from and writing to custom data stores. The `snakemake-storage-plugin-latch` package is a plugin that allows `snakemake` to interact natively with [Latch Data](/data/overview).

# Overview

The storage plugin works by treating files on Latch as if they were files under a non-existence `/ldata` directory. For example, the file `latch://123.account/a/b/c.txt` would be represented internally as `/ldata/123.account/a/b/c.txt`.

This scheme allows common patterns such as `Path(dir) / "file"` to "just work" with Latch objects. When creating the `config` file, `LatchFile`s and `LatchDir`s are encoded as paths of this form.

# Usage

Configuring your `Snakefile` to use this storage plugin is, in most cases, fairly straightforward. There are a few exceptional cases to keep in mind, but for the most part minimal edits are required. Following are a description of common cases where edits are required.

## Using the `{input}` / `{output}` Wildcards

Firstly, ensure that *there are no hardcoded paths in any shell commands*. For example

```snakemake theme={null}
rule test_storage:
    input:
        "hello.txt"
    output:
        os.path.join(config['remote_output_dir'], "hello.txt") # assume config['remote_output_dir'] is a path on Latch
    shell:
        "cp {input} {output}"
```

will copy the local file `hello.txt` onto Latch under `remote_output_dir`.

Note that in the example above, the shell command never explicitly references the output path, and instead references the `{output}` wildcard. This is intentional, and all rules that can reference Latch objects must use this pattern to function correctly.

Snakemake storage plugins in general work by doing all operations on a local copy of the remote file, then uploading the remote file back at the end of rule execution. In the example above, the `{output}` wildcard is replaced with the path of the local copy. This local copy is stored opaquely and its location can change at runtime depending on the way the pipeline is configured, so the only way to reliably reference it is by using the wildcard. This also applies to inputs and the `{input}` wildcard, for the exact same reason.

## Remote Paths in the `params:` Directive

When using a remote path in `params:` directive, it is required that the path be marked with the `storage(...)` flag.

By default, Snakemake does not consider `params:` members as storage objects unless explicitly told to do so, hence file downloads / uploads will not happen. For this reason, every parm value that can be a remote storage object must be marked with `storage(...)`. For example:

```snakemake theme={null}
rule test_storage:
    input:
        "hello.txt"
    params:
        auxilliary = storage(config['auxilliary_file'])
    output:
        os.path.join(config['remote_output_dir'], "hello.txt") # assume config['remote_output_dir'] is a path on Latch
    shell:
        "cp {input} {output} && cp {input} {params.auxilliary}"
```

## Using Filesystem APIs outside of Rules

Since remote paths represent remote identifiers rather than local filesystem objects, code that performs file operations (such as reading file contents) will not work outside of a rule context. When your pipeline execution depends on accessing specific files before rule execution begins, we recommend **explicitly downloading these files in the runtime task before calling `snakemake`**. The runtime task function definition can be found in the `entrypoint.py` file generated by the `latch snakemake generate-entrypoint` command.

For example, you expand a wildcard based on which files are present in a specific `input_dir`. You can stage this `input_dir` ahead of time like below:

```python3 theme={null}
# Assume `input_dir` is a `LatchDir` parameter passed to `snakemake_runtime(...)`

print(f"Staging {input_dir.remote_path}...", flush=True)

# Need `from latch.ldata.path import LPath` at the top of file
input_dir = LPath(input_dir.remote_path).download(shared / "input_dir")

print("Done.")

config = {
    ...
    "input_dir": get_config_val(input_dir),
    ...
}
```

Note that after the directory is downloaded locally, instead of passing the remote identifier to the `config` object, we pass the local path of the downloaded directory.
