The guide walks through typical workflows involving AWS S3, Latch Data, and offers optimal practices for interacting with these data sources within Latch Pods.

1. Use Latch Data FUSE to view the content of S3 buckets on Latch

To access data in your organization’s AWS S3 bucket inside a Pod, it is recommended that you:

  1. Mount your AWS S3 buckets on Latch Data: To mount your S3 buckets on Latch Data, please visit our guide here.
  2. Mount Latch Data inside your Latch Pod: Latch Pods come with Latch Data FUSE (Filesystem in Userspace), which displays the entire filesystem on Latch Data on pods. The FUSE is available on every new pod by default, and its content can be inspected under the directory /ldata.

2. Copy frequently used folders to a local directory inside Pods

Latch Data FUSE only displays a mirror of the file system on Latch Data, and does not download every file and folder by default.

Downloads are triggered solely upon attempting to access the folder programmatically or by double-clicking within the RStudio/JupyterLab interface. Initial access to a folder might incur brief delays due to downloading. After this, the folder’s contents are cached, ensuring instantaneous access for subsequent reads.

When a Pod shuts down, all cache is purged. For large folders that need to be accessed frequently, it is advisable to copy the folder from /ldata to a local scratch directory inside your Pod. To efficiently copy large folders from /ldata to a local directory, see the next section on latch cp.

3. Use latch cp for swift file copy between local and S3 buckets on Latch Data

For efficient file transfers between local directories within Pods and mounted S3 on Latch Data (as well as the reverse), it is recommended to use latch cp command to ensure optimal copying speed. As S3 buckets are integrated with Latch Data, latch cp can be used for copying files and directories to any Latch location, including mounted S3 buckets.

It is also possible to use the Linux cp command to copy local directories to an S3 folder in Latch Data FUSE (i.e. cp local_folder /ldata/s3_folder). However, the speed will be significantly slower in comparison to latch cp local_folder latch:///s3_folder.

For a comprehensive guide on how to copy data between Latch Data and Pods, please visit the documentation here.

If you latch cp a local directory that has the same folder name as a folder inside your S3 bucket on Latch Data, latch cp will overwrite that folder in your Latch Data. If this behavior is undesirable, it is recommended that you change the name of your local directory to not overlap with an existing folder in your S3 bucket.

Known limitations and workarounds

  1. Writing to Latch Data FUSE is slow. For example, the following commands are known to be slow:

    touch /ldata/new.txt 
    cp /root/big_folder /ldata/big_folder 
    mv /root/big_folder /ldata/big_folder
    rm -rf /root/big_folder /ldata/big_folder
    mkdir /ldata/big_folder
    

    Workaround: Use latch touch, latch cp, latch mv, latch rm, latch mkdir instead if you want to write to Latch Data. Since FUSE is a mirror of Latch Data, all new content written directly to Latch Data will be synced and reflected in /ldata.

  2. rsync is not currently supported. The following command will throw an error:

    rsync -av /root/scratch /ldata/big_folder 
    

    Workaround: latch cp is currently the most effective alternative. However, it’s important to note that latch cp copies entire directories between local and remote paths every time, instead of transferring only the differences.

  3. Latch Data FUSE doesn’t always refresh when a new file or folder is uploaded to Latch Data via latch cp or via the interface on console.latch.bio/data.

    Workaround: To get the most up-to-date content for /ldata, write a fake new file to /ldata (e.g. touch /ldata/new.txt), which will trigger a refresh. Alternatively, you can also restart your Pod, which will restart the mount for FUSE.

Planned Improvements

  1. Support latch rsync to only copy the content differences between local directories and remote Latch directories.
  2. Make reading and writing faster for Latch Data FUSE
  3. Fix inconsistent refresh behavior of Latch Data FUSE