Thanks for writing this up! I'm working on a very similar service (https://embeddingsync.com/) and I implemented almost the same as you've described here, but using a poll-based stateful workflow model instead of queueing.
The biggest challenge - which I haven't solved as seamlessly as I'd like - is supporting updates / deletes in the source. You don't seem to discuss it in this post, does Neum handle that?
We do support updates for some sources. Deletes not yet. For some sources we do polling which is then dumped on the queues. For other we have listeners that subscribe to changes.
What are the challenges you are facing in supporting this?
Similar to you, for polling you only see new data not the deletion events so I can't delete embeddings unless I keep track of state and do a diff. To properly support that you/I would need effectively CDC, which gets more complex for arbitrary / self-serve databases.
The biggest challenge - which I haven't solved as seamlessly as I'd like - is supporting updates / deletes in the source. You don't seem to discuss it in this post, does Neum handle that?