The Latch SDK introduces a construct calledDocumentation Index
Fetch the complete documentation index at: https://wiki.latch.bio/llms.txt
Use this file to discover all available pages before exploring further.
map_task to help parallelize a
task across a list of inputs. This means you can run multiple instances of
the task at the same time inside a single workflow, providing valuable
performance gains.
Let’s look at a simple example below!
First, import map_task into your workflow:
A map task can only accept one input and produce one output.
a_mappable_task across a collection of inputs using the map_task function. This function takes in a_mappable_task and returns a mapped version of that task. This mapped version takes as input a list of inputs to a_mappable_task , and returns a list of the outputs of a_mappable_task run on all inputs in the list in parallel.
a_mappable_task that is passed to a
map_task() and run repeatedly on a list of inputs in parallel. You have also
defined a coalesce task to collect the list of outputs from the mapped task
and returns a string.
Map a Task with Multiple Inputs
You may want to map a task with multiple inputs. For example, the task below takes in 2 inputs, a base and a DNA sequence, and returns the percentage of that base in the sequence:base input while the
dna_sequence stays the same. Since a map task accepts only one input, we can
do this by creating a new task that prepares the map task’s inputs.
We start by putting the inputs in a Dataclass and dataclass_json.
count_task. Instead of 2 inputs, count_task
has a single input:
mappable_task in our workflow:
count_wf to spin up four tasks in
parallel. The map_task returns a list of four floats, each of which is the
percentage of base pair in the DNA sequence.
Bonus: Learning through a Biological Example
In the example below, we walk through a practical example of how we can use the map task construct to run FastQC on multiple samples and summarize their results in a MultiQC report. First, we define a Dataclass that contains a sample name and its associated FastQ file:Concept check: Note how this task will later be mapped across a list of
samples. Therefore, the task is defined to accept one input and return one
output.
Concept check: Because the map task will return a list of
LatchDirs,
each of which contains an individual sample’s FastQC results, the
multiqc_task needs to also accept a list of LatchDirs.Samples and
returns a single directory with the MultiQC report: