I'd like to learn more about JuiceFS, but from their architecture diagrams I'm s...

thinxer · on Feb 23, 2024

For “cloud-native” apps, JuiceFS is not needed.

S3 is not designed for intensive metadata operations, like listing, renaming etc. For these operations, you will need a somewhat POSIX-complaint system. For example, if you want to train on ImageNet dataset, the “canonical” way [1] is to extract the images and organize them into folders, class by class. The whole dataset is discovered by directory listing. This where JuiceFS shines.

Of course, if the dataset is really massive, you will mostly end-up with in-house solutions.

[1]: https://github.com/pytorch/examples/blob/main/imagenet/extra...

KaiserPro · on Feb 23, 2024

S3's metadata speed is horrifically slow. Part of the reason fake filesystems on S3 also perform badly is that checking anything to do with where a file is located, takes >250ms.

Now if you think about how many files are in a git repo, and how many are touched when you commit, you can see the problem.

I'm not sure JuiceFS is the answer, given that the metadata is stored in memory. but it is an answer.

personally you're better off with their managed lustre offering

Geisterde · on Feb 23, 2024

Is this something thats a deficiency with S3, or is there not a more purpose built offering?

KaiserPro · on Feb 23, 2024

S3, as you know, is a key-based object store, with a hack that allows directories ('/' is just part of the key name, and is filtered) So its less of a deficiency, more of a tradeoff they went with to get performance/uptime. Enumerating the keys on an S3 bucket is pretty slow, so it makes any kind of listing operation for a follow on system slow.

Having a metadata cache is sensible option, especially if you have complete control of a bucket and are able to be canonical. (ie everything talks to your DB to get keynames)

But! what you can't do write to the middle of a file, you need to upload the whole thing again. This is not a problem for a lot of workflows, but for POSIX filesystems thats going to cause problems. You can seek to the middle of a file, write to it, the client will upload it to S3. What happens if another system modifies that file when you are uploading? Sounds like a locking nightmare.

snerbles · on Feb 23, 2024

In the case of JuiceFS the file is split into chunks and those are the objects uploaded to S3 - treating it like a high latency block device.

The downside is that metadata here isn't just a cache, it is necessary to operate in this fashion and must be backed up.

daviesliu · on Feb 23, 2024

JuiceFS is similar to HDFS/CephFS/Lustre, so it MUST has a component to manage metadata, similar to NameNode of HDFS or MDS of CephFS, this point of failure is the problem we have to address.

The underlying blob store systems is similar to DataNode or OSD in other distributed file system, could be slower than them a little bit because of the middle layers, the overall performance is determined by the disks.

So we can expect similar performance comparing to HDFS/CephFS, the benchmark results also confirm that.

snerbles · on Feb 23, 2024

> Is this only needed if you want properly faked file system primitives over blob stores, if you can't use blob stores directly?

Semiconductor EDA comes to mind, where users are at the mercy of their tool vendors and they expect something that really hasn't evolved much past an office network of Unix workstations from 1999. Object storage is almost completely alien to the tooling, and despite significant file & block storage costs there is little interest from EDA tool vendors to adapt their tools to object storage. This is a challenge for semiconductor firms operating in or moving to a cloud environment.

S3FS/rclone can of course act as a shim, but are very slow when it comes to metadata operations in a typical shell. But if you were to move your metadata away from the distant object store and closer to to your compute environment things actually start becoming usable - this is the case with JuiceFS. Of course there are also tiered storage systems like Weka would have better overall performance, but is more complicated to set up and more expensive to operate than JuiceFS.