What is a Workflow?
A workflow is an analysis that takes in some input, processes it in one or more steps and produces some output.
Formally, a workflow can be described as a directed acyclic graph (DAG), where each node in the graph is called a task. This computational graph is a flexible model to describe most any bioinformatics analysis.
In this example, a workflow ingests sequencing files in FastQ format and produces a sorted assembly file. The workflow’s DAG has two tasks. The first task turns the FastQ files into a single BAM file using an assembly algorithm. The second task sorts the assembly from the first task. The final output is a useful assembly conducive to downstream analysis and visualization in tools like IGV.
The Latch SDK lets you define your workflow tasks as python functions. The parameters in the function signature define the task inputs and return values define the task outputs. The body of the function holds the task logic, which can be written in plain python or can be subprocessed through a program/library in any language.
These tasks are then “glued” together in another function that represents the
workflow. The workflow function body simply chains the task functions by calling
them and passing returned values to downstream task functions. Notice that our
workflow function calls the task that we just defined, assembly_task
, as well
as another task we can assume was defined elsewhere, sort_bam_task
.
You must not write actual logic in the workflow function body. It can only be
used to call task functions and pass task function return values to downstream
task functions. Additionally all task functions must be called with keyword
arguments. You also cannot access variables directly in the workflow function;
in the example below, you would not be able to pass in read1=read1.local_path
.
Workflow function docstrings also contain markdown formatted documentation and a DSL to specify the presentation of parameters when the workflow interface is generated. We’ll add this content to the docstring of the workflow function we just wrote.
Workflow Code Structure
So far we have defined workflows and tasks as python functions but we don’t know where to put them or what supplementary files might be needed to run the code on the Latch platform.
Workflow code needs to live in directory with three necessary elements:
- a file named
Dockerfile
that defines the computing environment of your tasks - a file named
version
that holds the plaintext version of the workflow - a directory named
wf
that holds the python code needed for the workflow. - task and workflow functions must live in a
wf/__init__.py
file
These three elements must be named as specified above. The directory should have the following structure:
The SDK ships with easily retrievable example workflow code. Just type
latch init myworkflow
to construct a directory structured as above for
reference or boilerplate.
Example Dockerfile
Note: you are required to use our base image for the time being.
Example version
File
You can use any versioning scheme that you would like, as long as each register has a unique version value. We recommend sticking with semantic versioning.
Example wf/__init__.py
File
What happens at registration?
Now that we’ve defined our functions, we are ready to register our workflow with the LatchBio platform. This will give us:
- a no-code interface
- managed cloud infrastructure for workflow execution
- a dedicated API endpoint for programmatic execution
- hosted documentation
- parallelized CSV-to-batch execution
To register, we type latch register <directory_name>
into our terminal (where
directory_name is the name of the directory holding our code, Dockerfile and
version file).
The registration process requires a local installation of Docker.
To re-register changes, make sure you update the value in the version file. (The value of the version is not important, only that it is distinct from previously registered versions).
Remote Registration [Alpha]
If you do not have access to Docker on your local machine, lack space on your
local filesystem for image layers, or lack fast internet to facilitate timely
registration, you can use the --remote
flag with latch register
to build and
upload your workflow’s images from a managed and speedy machine.
The registration process will behave as usual but the build/upload will not occur on your local machine.
Was this page helpful?