Authoring your Own Workflow
In this demonstration, we will examine a workflow which sorts and assembles COVID sequencing data.
This document aims to be an extension to the Quickstart to help you better understand the structure of a workflow and write your own.
Prerequisite:
- Complete the Quickstart guide.
What you will learn:
How to write a task and create a workflow
How to define compute and storage requirements
How to manage third-party dependencies
How to customize a user-friendly interface
How to test your workflow from the Latch CLI and Latch Console
1: Initialize Workflow Directory
Bootstrap a new workflow directory by running latch init
from the command line.
File Tree:
Once your boilerplate workflow has been created successfully, you should see a folder called covid-wf
.
2. Writing your First Task
A task is a Python function that takes in inputs and returns outputs to the Latch platform or to another task. In the assemble.py
file inside the covid-wf/wf
directory, there is a task which ingests two sequencing reads and outputs an assembled SAM file.
Working with LatchFile
s and LatchDir
s
LatchFile
and LatchDir
are types built into the Latch SDK which allow users to use files and directories stored remotely on Latch as inputs and outputs to workflows. They point to remote file locations in our user interface on the Latch Console and implement the necessary operations to ensure data is available in the task environment and outputs are uploaded to your Latch account.
The first time you read or resolve the path of a file or directory in a task, the data will be downloaded and will be accessible within the task. For example the line local_file = Path(read1)
in the snippet below will download the file from Latch and return a path to the local copy.
Sometimes, a python task does not directly access a file/directory, but uses its path; for example, through a subprocessed shell command. We can ensure that the file’s data is present by accessing the local_path
attribute of the LatchFile or LatchDir. This will download the data into the task environment and return back a path to the local file or directory.
Returning a LatchFile or LatchDirectory from a task will upload it to the latch platform in the location specified by the second argument. For example, after running a task which ends in the following snippet, the file covid_assembly.sam
will be available in the root directory of Latch Data.
Here, LatchFile(...)
takes as input two values: the first is the local filepath and the second is the target remote file location on Latch. Remote paths on Latch start with the prefix latch:///
.
3. Define compute and storage requirements
Specifying compute and storage requirements is as easy as using a Python decorator.
See an exhaustive reference to larger CPU and GPU tasks here.
To arbitrarily specify resource requirements, use:
4. Manage installation for third-party dependencies
Internally, Latch uses Dockerfile
s for dependency management, which are automatically generated from environment files or handwritten.
Install samtools
and autoconf
by adding them to the dependency list in system-requirements.txt
:
Set an environment variable for bowtie2 using an environment file:
Now run latch dockerfile .
to examine the auto generated Dockerfile:
See the Workflow Environment page for further information configuring the environment for other use dependencies such as R or Conda.
5. Customize user interface
There are two pages that you can customize: the About page for your workflow and a Parameters page for workflow input parameters.
To modify the About page, simply write your description in Markdown in the docstring of the workflow function.
Latch provides a suite of front-end components out-of-the-box that can be defined by using Python objects LatchMetadata
and LatchParameter
:
The metadata
variable then needs to be passed into the @workflow
decorator to apply the interface to the workflow.
Preview the workflow’s interface
To preview the user interface for your workflow, first navigate to the workflow directory
Example directory structure:
Then, type the following command:
latch preview
will open up a browser displaying your interface.
6. Add test data for your workflow
First, you’d need to upload your test data to a remote S3 bucket using latch test-data
command.
The example files are hosted on AWS S3 and are publicly available for download. Thus, they should only contain information that can be made publicly available.
Confirm that your file has been successfully uploaded:
Now, you can use Latch LaunchPlan
to add test data to your workflow.
These default values will be available under the ‘Test Data’ dropdown at Latch Console.
7. Register your workflow to Latch
You can release a live version of your workflow by registering it on Latch:
The registration process will:
- Build a Docker image containing your workflow code
- Serialize your code and register it with your LatchBio account
- Push your docker image to a managed container registry
When registration has completed, you should be able to navigate here and see your new workflow in your account.
8. Test your workflow
Using Latch Console
To test your first workflow on Console, select the Test Data and click Launch. Statuses of workflows can be monitored under the Executions tab.
Using Latch CLI
Using latch get-wf
, you can view the names of all workflows available in your workspace:
To launch the workflow on Latch Console from the CLI, first generate a parameters file by using latch get-params
and passing in the name of your workflow like so:
which will return a parameters file called wf.__init__.assemble_and_sort.params.py
, whose contents are as below:
To launch a workflows with parameters specified in the above file, use:
You can view execution statuses from the CLI, run:
The command will open up a Terminal UI with the same capabilities on the Executions page on the Latch Platform, where you will see a list of executions, tasks, and logs for easy debugging.
Was this page helpful?