The guide walks through typical workflows involving AWS S3, Latch Data, and offers optimal practices for interacting with these data sources within Latch Pods.
1. Use Latch Data FUSE to view the content of S3 buckets on Latch
To access data in your organization’s AWS S3 bucket inside a Pod, it is recommended that you:
- Mount your AWS S3 buckets on Latch Data: To mount your S3 buckets on Latch Data, please visit our guide here.
- Mount Latch Data inside your Latch Pod: Latch Pods come with Latch Data FUSE (Filesystem in Userspace), which displays the entire filesystem on Latch Data on pods. The FUSE is available on every new pod by default, and its content can be inspected under the directory
2. Copy frequently used folders to a local directory inside Pods
Latch Data FUSE only displays a mirror of the file system on Latch Data, and does not download every file and folder by default.
Downloads are triggered solely upon attempting to access the folder programmatically or by double-clicking within the RStudio/JupyterLab interface. Initial access to a folder might incur brief delays due to downloading. After this, the folder’s contents are cached, ensuring instantaneous access for subsequent reads.
When a Pod shuts down, all cache is purged. For large folders that need to be accessed frequently, it is advisable to copy the folder from
/ldata to a local scratch directory inside your Pod. To efficiently copy large folders from
/ldata to a local directory, see the next section on
latch cp for swift file copy between local and S3 buckets on Latch Data
For efficient file transfers between local directories within Pods and mounted S3 on Latch Data (as well as the reverse), it is recommended to use
latch cp command to ensure optimal copying speed. As S3 buckets are integrated with Latch Data,
latch cp can be used for copying files and directories to any Latch location, including mounted S3 buckets.
cpcommand to copy local directories to an S3 folder in Latch Data FUSE (i.e.
cp local_folder /ldata/s3_folder). However, the speed will be significantly slower in comparison to
latch cp local_folder latch:///s3_folder.
For a comprehensive guide on how to copy data between Latch Data and Pods, please visit the documentation here.
latch cpa local directory that has the same folder name as a folder inside your S3 bucket on Latch Data,
latch cpwill overwrite that folder in your Latch Data. If this behavior is undesirable, it is recommended that you change the name of your local directory to not overlap with an existing folder in your S3 bucket.
Known limitations and workarounds
- Writing to Latch Data FUSE is slow. For example, the following commands are known to be slow:
touch /ldata/new.txt cp /root/big_folder /ldata/big_folder mv /root/big_folder /ldata/big_folder rm -rf /root/big_folder /ldata/big_folder mkdir /ldata/big_folder
Workaround: Use latch touch, latch cp, latch mv, latch rm, latch mkdir instead if you want to write to Latch Data. Since FUSE is a mirror of Latch Data, all new content written directly to Latch Data will be synced and reflected in
rsyncis not currently supported. The following command will throw an error:
rsync -av /root/scratch /ldata/big_folder
latch cp is currently the most effective alternative. However, it’s important to note that
latch cp copies entire directories between local and remote paths every time, instead of transferring only the differences.
- Latch Data FUSE doesn’t always refresh when a new file or folder is uploaded to Latch Data via
latch cpor via the interface on console.latch.bio/data.
Workaround: To get the most up-to-date content for
/ldata, write a fake new file to
touch /ldata/new.txt), which will trigger a refresh. Alternatively, you can also restart your Pod, which will restart the mount for FUSE.
latch rsyncto only copy the content differences between local directories and remote Latch directories.
- Make reading and writing faster for Latch Data FUSE
- Fix inconsistent refresh behavior of Latch Data FUSE