Great work. I'm not an expert in database internals, but maybe someone can help ...

hackthesystem · on Feb 1, 2022

Great questions! I'm not a database expert either but I can try answering these:

1) I think databases like to manage pages directly because the db can make more optimizations than the OS because the db has more context. For example, when aborting a transaction the db knows its dirty pages should be evicted (i'm not sure if mmap offers custom eviction). Also I believe if the db uses mmap, it loses control over when pages are flushed to disk. Flush control is necessary for guaranteeing transaction durability.

2) What you're describing here sounds similar to a LSM-tree database (e.g. RocksDB). They are used often for write-heavy workloads because writes are just appends, but they might not be great for read-heavy things.

3) This reminds me of PRQL[1] (which was trending on Hacker News last week) and Spark SQL. I'm not too familiar with this area though, so I can't really say why SQL was designed this way.

[1] https://github.com/max-sixty/prql?utm_source=hackernewslette...

akrymski · on Feb 2, 2022

1) Indeed you should only use mmap for reads afaik

2) Was thinking more of an event-sourcing model, whereby you log the SQL statements first, then update a B-Tree in the background.

Read via mmap, write by appending to a log and asynchronously applying the changes to the file.

3) Rather than yet another QL, expose a higher level API that I can target in any language

akrymski · on Feb 3, 2022

Another thing to consider is pluggable storage (a key/value interface) and pluggable query language (relational algebra interface?) and how to fit the two together.