That is correct about flushing. RE: consuming. The TLDR; is that the agents in an availability zone cluster with each other to form a distributed file cache such that no matter how many consumers you attach to a topic, you will almost never pay for more than 1 GET request per 4MiB of data, per zone. Basically when a consumer fetches a block of data for a single partition, that will trigger an "over read" of up to 4MiB of data that is then cached for subsequent requests. This cache is "smart" and will deduplicate all concurrent requests for the same 4MiB blocks across all agents within an AZ.
It's a bit difficult to explain succinctly in an HN comment, but hopefully that helps.
Is there a reason you built that cache layer yourself (rather than each node "just" running its own sidecar MinIO instance, that write-throughs to the origin object store?)
The cache is for reads, not writes. There is no cache for writes.
We built our own because it needed to behave in a very specific way to meet our cost/latency goals. Running a MinIO sidecar instance means that every agent would effectively have to download every file in its entirety which would not scale well and would be expensive. We also have a pretty hard and fast rule about keeping deploying WarpStream as simple as rolling out a single stateless binary.
That is correct about flushing. RE: consuming. The TLDR; is that the agents in an availability zone cluster with each other to form a distributed file cache such that no matter how many consumers you attach to a topic, you will almost never pay for more than 1 GET request per 4MiB of data, per zone. Basically when a consumer fetches a block of data for a single partition, that will trigger an "over read" of up to 4MiB of data that is then cached for subsequent requests. This cache is "smart" and will deduplicate all concurrent requests for the same 4MiB blocks across all agents within an AZ.
It's a bit difficult to explain succinctly in an HN comment, but hopefully that helps.