Overview

Latch Pods come with Latch Data FUSE (Filesystem in Userspace), a virtual filesystem inside of pods which allows you to directly interact with the data in your workspace and share data between pods easily. You can create, read, write and delete files and directories in your Latch Data simply with your command line as you would with any other filesystem.

The Latch Data FUSE is available on every new pod by default, and its content can be inspected under the directory /ldata.

You can also create a link to /ldata in your home directory for easier access with the following command:

ln -s /ldata /root

Common Use Cases

1. Use Latch Data FUSE to view the content of S3 buckets on Latch

To access data in your organization’s AWS S3 bucket inside a Pod, it is recommended that you:

  1. Mount your AWS S3 buckets on Latch Data: To mount your S3 buckets on Latch Data, please visit our guide here.
  2. Mount Latch Data inside your Latch Pod: Launch Latch Pod which will automatically mount Latch Data FUSE.

2. Copy Data from Pod to Latch Data

To copy data from your Pod to Latch Data, you can use the following command:

cp -r local_folder /ldata/s3_folder

Uploaded data will appear inside of your Latch Data on the console.

Latch Data FUSE write speed is much slower than using latch cp local_folder latch:///s3_folder. If you are trying to copy large amount of data to Latch Data, see below

3. Accessing Latch Data from JupyterLab or RStudio

Latch Data FUSE is accessible from JupyterLab and RStudio. You can access your data in Latch Data from JupyterLab or RStudio by navigating to /ldata in the file browser.

4. Accessing Latch Data using Python Script

You can also access Latch Data from Python scripts using the os module. For example, to list all files in a directory and to read a file in Latch Data, you can use the following code snippet:

import os

# List all files in a directory
files = os.listdir('/ldata')
print(files)

# Read a file
with open('/ldata/file.txt', 'r') as f:
    print(f.read())

Best Practices

1. Copy frequently used folders to a local directory inside Pods

Latch Data FUSE only displays a mirror of the file system on Latch Data, and does not download every file and folder by default.

Folder child metadata is downloaded when upon attempting to access the folder programmatically or by double-clicking within the RStudio/JupyterLab interface. Initial access to a folder might incur brief delays due to metadata downloading. After this, the folder’s contents are cached, ensuring instantaneous access for subsequent reads.

When a Pod shuts down, all cache is purged. For large folders that need to be accessed frequently, it is advisable to copy the folder from /ldata to a local scratch directory inside your Pod. To efficiently copy large folders from /ldata to a local directory, see the next section on latch cp.

2. Use latch cp for swift file copy between local and S3 buckets on Latch Data

For efficient file transfers between local directories within Pods and S3 buckets mounted on Latch Data, it is recommended to use latch cp command to ensure optimal copying speed. After S3 buckets are mounted to Latch Data, latch cp can be used for copying files and directories to any Latch location, including mounted S3 buckets.

For a comprehensive guide on how to copy data between Latch Data and Pods, please visit the documentation here.

If you latch cp a local directory that has the same folder name as a folder inside your S3 bucket on Latch Data, latch cp will overwrite that folder in your Latch Data. If this behavior is undesirable, it is recommended that you change the name of your local directory to not overlap with an existing folder in your S3 bucket.

Known Limitations and Workarounds

  • Some operations in Latch Data FUSE are slow. For example, the following commands are known to be slow:

    touch /ldata/new.txt
    cp /root/big_folder /ldata/big_folder
    mv /root/big_folder /ldata/big_folder
    rm -rf /ldata/big_folder
    

    Workaround: Use latch touch, latch cp, latch mv, latch rm, latch mkdir instead if you want to write to Latch Data. Since FUSE is a mirror of Latch Data, all new content written directly to Latch Data will be synced and reflected in /ldata.

  • rsync is not currently supported. The following command will throw an error:

    rsync -av /root/scratch /ldata/big_folder
    

    Workaround: latch cp is currently the most effective alternative. However, it’s important to note that latch cp copies entire directories between local and remote paths every time, instead of transferring only the differences.

  • Latch Data FUSE doesn’t always refresh when a new file or folder is uploaded to Latch Data via latch cp or via the interface on console.latch.bio/data.

    Workaround: To get the most up-to-date content for /ldata, write a fake new file to /ldata (e.g. touch /ldata/new.txt), which will trigger a refresh. Alternatively, you can also restart your Pod, which will restart the mount for FUSE.

  • Latch Data FUSE doesn’t always refresh when a new file or folder is uploaded to Latch Data via latch cp or via the interface on console.latch.bio/data.

    Workaround: To get the most up-to-date content for /ldata, restart the Ldata FUSE mount with:

    systemctl restart latch-ldata-fuse.service
    

Troubleshooting

  • If you don’t see Latch Data mounted at /ldata or there’s no data in that directory try re-mounting the filesystem with

    systemctl restart latch-ldata-fuse.service
    

Planned Improvements

  1. Support latch rsync to only copy the content differences between local directories and remote Latch directories.
  2. Make reading and writing faster for Latch Data FUSE
  3. Fix inconsistent refresh behavior of Latch Data FUSE