I have to assume this was rejected as a valid approach, but if not, this whole thread gets 10x more interesting than it already was. I hope the author responds.
Well, even if you just insert zeros instead of random values, it takes 9 seconds on my computer to insert 100M rows, so even that is not a 1B rows per minute.
And I think INSERT INTO ... SELECT is the fastest way to bulk insert data into sqlite.
Also, I have tried to use carray sqlite feature that allow to share memory with sqlite and use recursive CTE to query it, but it is slower. Though, you can pass values you've generated from Rust instead of using random().
My run on NixOS 21.05 with sqlite 3.35.2, without randomness, on an i7-7500U:
Invocation:
command time sqlite3 ':memory:' '
create table IF NOT EXISTS user
(
id INTEGER not null primary key,
area CHAR(6),
age INTEGER not null,
active INTEGER not null
);
INSERT INTO user (area, age, active) SELECT 0, 1, 2 FROM generate_series(1, 100000000, 1);
'
command time sqlite3 ':memory:' '
PRAGMA journal_mode = OFF;
PRAGMA synchronous = 0;
PRAGMA cache_size = 1000000;
PRAGMA locking_mode = EXCLUSIVE;
PRAGMA temp_store = MEMORY;
create table IF NOT EXISTS user
(
id INTEGER not null primary key,
area CHAR(6),
age INTEGER not null,
active INTEGER not null
);
INSERT INTO user (area, age, active) SELECT 0, 1, 2 FROM generate_series(1, 100000000, 1);
'
Since it is only 100M rows, it takes 1.8 GB on the disk, so I've used tmpfs for this which essentially is a ramdisk. But I have a gen4 pcie nvme SSD - it can reliably write at 4GB/s sequentially, so writing takes a ~500ms for 100M rows, it is not a bottleneck here. random() takes ~half of the insert time. Generating those values with Rust, for example, is faster, but sharing this data with sqlite takes more time than generating it with random().
Maybe implementing custom virtual table in C or Rust like build-in generate_series, but the one that will produce user table fields will be faster, but that is significantly more effort than my query.
This query with random() and generate_series executed in sqlite CLI takes whooping 8MB of the RAM, so you don't even have to close all Electron-based applications to run it on a computer with 8GB of RAM.
Faster by 10% than fastest author implementation on my machine - 19 seconds against 21 for 'threaded_batched'.