Tutorial
This tutorial will outline the steps required to launch a Nextflow pipeline on Latch.
Prerequisites
- Register for an account and log into the Latch Console
- Install a compatible version of Python. The Latch SDK is currently only supported for Python
>=3.8
and<=3.11
- Install the Latch SDK
>= 2.52.3
Example on Ubuntu:
Step 1: Clone your Nextflow pipeline
We will use nf-core’s rnaseq as an example; however, feel free to follow along with any Nextflow pipeline.
Step 2: Define metadata and workflow graphical interface
The input parameters need to be explicitly defined to construct a graphical interface for a Nextflow pipeline. These parameters will be exposed to scientists in a web interface once the workflow is uploaded to Latch.
The Latch SDK provides a command to automatically generate the metadata file from an existing nextflow_schema.json
file.
If your workflow does not have a nextflow_schema.json
file, you must manually define the Nextflow metadata and input parameters.
The command parses parameters defined in the nextflow_schema.json
and generates two files:
The first file holds the NextflowMetadata
object, and the second file contains the input parameter definitions.
Before continuing, we will need to make a few updates to the generated files to ensure that the input parameters are correctly defined:
- It is important to always verify that the generated input parameters and their inferred types are as expected.
After inspecting the generated parameters for nf-core/rnaseq, we notice that the data types for
hisat2_index
,salmon_index
, andrsem_index
aretyping.Optional[str]
instead oftyping.Optional[LatchFile]
. Update these parameters to their correct types.
- For this tutorial, we will execute the workflow using the parameters defined in the
test
configuration profile. To simplify the user interface, remove any input parameters not defined inconf/test.config
from yourlatch_metadata/parameters.py
.
After making the above updates, your latch_metadata/parameters.py
file should now look like this:
Example:
Let’s inspect the most relevant fields of the NextflowMetadata
object:
display_name: The display name of the workflow, as it will appear on the Latch UI.
author: Name of the person or organization that publishes the workflow
parameters: Input parameters to the workflow, defined as NextflowParameter
objects. The Latch Console will expose these parameters to scientists before they execute the workflow.
Input parameters are passed to Nextflow as command line arguments via --param-name param-value
. Therefore, the key
of the parameters
dictionary should match the name of the parameter in the Nextflow script.
runtime_resources: The resources the Nextflow Runtime requires to execute the workflow. The storage_gib
field will configure the storage size in GiB for the shared filesystem.
log_dir: Latch directory to dump .nextflow.log
file on workflow failure.
Step 3 (Optional): Importing samplesheets from Latch Registry
The input
parameter in our rnaseq workflow currently accepts a samplesheet as a CSV formatted LatchFile. Generating and handling
samplesheet files can be cumbersome and error-prone.
Latch Registry is a friendly table interface that allows users to fill out a sample sheet and link sequencing file for each sample. The section below outlines how to configure a Nextflow workflow to accept a Latch Registry samplesheet (instead of a LatchFile).
- Define the schema required for the input samplesheet as a Python dataclass. Each field in the dataclass represents a column in the samplesheet.
For nf-core/rnaseq
, add the following snippet to your latch_metadata/parameters.py
file:
- Update the type information for the samplesheet input parameter. Samplesheets must always be a list of dataclass objects.
Locate the input
parameter in the generated_parameters
dictionary in latch_metadata/parameters.py
and make the following changes:
- Define a samplesheet constructor to convert the input objects to a samplesheet file.
Your latch_metadata/parameters.py
should now look like this:
Step 4: Register the workflow
To register a Nextflow pipeline on Latch, type:
Lets break down the above command:
latch register .
: Searches for a Latch workflow in the current directory and registers it to Latch.
--nf-script main.nf
: Specifies the Nextflow script passed to the Nextflow command at runtime. For this workflow: nextflow run main.nf
--nf-execution-profile docker,test
: Defines the execution profile to use when running the workflow on Latch. We specify the docker
configuration profile to execute processes in a containerized environment.
After running the above command, the Latch SDK will generate two files:
latch.config
- a Nextflow configuration file passed to Nextflow via the-config
flag.wf/entrypoint.py
- the generated Latch SDK workflow code that executes the Nextflow pipeline.
Once the workflow is registered, click on the link provided in the output of the latch register
command. This will take you to an interface like the one below:
As a part of the registration process, we build a docker image which is specified in a Dockerfile
. Normally this Dockerfile
is autogenerated and stored in .latch
, but if there is already a Dockerfile
in the workflow directory prior to registering, it will be used to build this image. This can result in errors down the line if the Dockerfile is not generated by Latch.
Step 5: Execute the workflow
Before executing the workflow, we need to upload test data to Latch. You can find sample test data here.
Copy the test data to your Latch workspace by clicking the Copy to Workspace
button in the top right corner.
Now, let’s create the samplesheet in Latch Registry.
- Navigate to the Latch Registry and create a new Table.
- Select the table you just created and click “Import CSV”. This will open up the Latch Data filesystem. Import the
samplesheet.csv
file you copied from the provided test data.
Your data is now uploaded to Latch and ready to be processed!
Navigate to the Workflows tab in the Latch Console and select the workflow you previously registered.
Then, select the appropriate input parameters from the test data you uploaded and click Launch Workflow
in the bottom right corner to execute the workflow.
The workflow orchestrator will use these input parameters along with the metadata provided at registration time to construct the Nextflow command. For example, the RNA-seq pipeline will be launched via the following command:
Step 6: Monitoring the workflow
After launching the workflow, you can monitor progress by clicking on the appropriate execution under the Executions
tab of your workflow.
Under the Graph & Logs
tab, you can view the generated two-stage DAG with the initialization step and the Nextflow runtime task.
If you click on the Nextflow runtime node, you can view the runtime logs generated by Nextflow.
Once the Nextflow runtime starts executing the workflow, a Process Nodes
tab will appear in the menu bar where you can monitor the status of each process in the workflow.
Each node in the DAG represents a process in the Nextflow pipeline.
To more easily navigate the graph, you can filter the process nodes by execution status by clicking the “Filter by Status” button in the top right corner.
Click on a process node to see details of every invocation of that process, including the resources provisioned, execution time, and logs.
Once the workflow is complete, you can view any published outputs in Latch Data. It is convention for Nextflow workflows to use the outdir
parameter
to prepend publishDir paths. For example, if we set our outdir
parameter to latch:///nf-rnaseq/outputs
, all pipeline outputs will be published to the
nf-rnaseq/outputs
directory in Latch Data.
Step 7 (Optional): Customizing the Workflow
As explained in Step 5, the latch register
command generates a Latch workflow that runs the Nextflow workflow. In order to provide developers with flexibility
over how their Nextflow pipelines are executed on Latch, the generated workflow code can be modified to execute custom pre- and post-processing logic.
In this tutorial, we will modify the generated wf/entrypoint.py
file to add a Run Name
parameter that will be used to namespace the outputs of the Nextflow pipeline.
To do this, add the run_name
parameter to your entrypoint.py
file as follows:
In the above code snippet:
- We add a
run_name
parameter to thenf_nf_core_rnaseq
workflow function. All parameters defined in thenf_nf_core_rnaseq
function are exposed the the user in the Latch UI. - Pass the
run_name
parameter to thenextflow_runtime
task in the body of the workflow function. - Add the
run_name
parameter to thenextflow_runtime
task signature.
Then, add logic to the nextflow_runtime
task to append the run_name
parameter to the outdir
parameter before executing the Nextflow pipeline.
We will now re-register the workflow with the above updates. We purposely exclude the --nf-script
flag in the latch register
command to avoid
re-generating the Latch SDK workflow code (which will overwrite our updates).
Was this page helpful?