When working with bioinformatics workflows, we are often passing around large files or directories between our tasks. These files are usually located in cloud object stores and are copied to the file systems of the machines on which the task is scheduled.
test.txt
without touching foo.txt
.
local_path
will always be the absolute path on the task’s machine where the
file has been copied to (the machine that your code is running on).
remote_path
will be a remote object URL with s3
or latch
as its host.
There are cases when we would want
to access these local_path
and remote_path
attributes directly:
latch ls latch:///foo
)fastq.gz
extension after a
trimming
task has been run.
To do this in the SDK, you can leverage the file_glob
function to construct
lists of LatchFile
s defined by a pattern.
The class of allowed patterns are defined as
globs. It is likely you’ve
already used globs in the terminal by using wildcard characters in common
commands, eg. ls *.txt
.
The second argument must be a valid latch URL pointing to a directory. This will
be the remote location of returned LatchFile
constructed with this utility.
In this example, all files ending with .fastq.gz
in the working directory of
the task will be returned to the latch:///fastqc_outputs
directory:
latch:///
URLshttps://google.com
and s3://my-bucket/dna.fa
are both valid descriptions of
objects, a webpage or a fasta file.
When referencing files stored within LatchBio’s managed filesystem (called
LatchData) we must use the latch
scheme to appropriately resolve objects to
the appropriate account.
For instance, latch:///foo.txt
might meant two entirely different things in
the context of two different accounts. The resolution to retrieve the correct
object occurs based on the user that executed the workflow,
Some examples of valid latch URLs referencing objects in a user’s filesystem:
latch:///guide_design/off_targets.csv
latch:///foo.txt
latch
URLslatch://shared/<path>
syntax.