Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

INSERT INTO user (area, age, active) SELECT abs(random()) % 1000000, (abs(random()) % 3 + 1) * 5, abs(random()) % 2 FROM generate_series(1, 100000000, 1)

Faster by 10% than fastest author implementation on my machine - 19 seconds against 21 for 'threaded_batched'.



I have to assume this was rejected as a valid approach, but if not, this whole thread gets 10x more interesting than it already was. I hope the author responds.


Well, even if you just insert zeros instead of random values, it takes 9 seconds on my computer to insert 100M rows, so even that is not a 1B rows per minute.

And I think INSERT INTO ... SELECT is the fastest way to bulk insert data into sqlite.

Also, I have tried to use carray sqlite feature that allow to share memory with sqlite and use recursive CTE to query it, but it is slower. Though, you can pass values you've generated from Rust instead of using random().


This is with in memory database and journaling disabled?


My run on NixOS 21.05 with sqlite 3.35.2, without randomness, on an i7-7500U:

Invocation:

    command time sqlite3 ':memory:' '
    create table IF NOT EXISTS user
    (
        id INTEGER not null primary key,
        area CHAR(6),
        age INTEGER not null,
        active INTEGER not null
    );
    INSERT INTO user (area, age, active) SELECT 0, 1, 2 FROM generate_series(1, 100000000, 1);
    '
Result:

    16.34user 0.43system 0:16.89elapsed 99%CPU (0avgtext+0avgdata 1477320maxresident)k
    11inputs+0outputs (0major+369851minor)pagefaults 0swaps
Invocation with pragmas:

    command time sqlite3 ':memory:' '
    PRAGMA journal_mode = OFF;
    PRAGMA synchronous = 0;
    PRAGMA cache_size = 1000000;
    PRAGMA locking_mode = EXCLUSIVE;
    PRAGMA temp_store = MEMORY;
    create table IF NOT EXISTS user
    (
        id INTEGER not null primary key,
        area CHAR(6),
        age INTEGER not null,
        active INTEGER not null
    );
    INSERT INTO user (area, age, active) SELECT 0, 1, 2 FROM generate_series(1, 100000000, 1);
    '
Result with pragmas:

    17.31user 0.41system 0:17.85elapsed 99%CPU (0avgtext+0avgdata 1477288maxresident)k
    11inputs+0outputs (0major+369850minor)pagefaults 0swaps
As expected, the pragmas make no difference when using `:memory:` -- 17 seconds, 1.4 GB RAM each on my laptop.


I have used pragmas from the article.

Since it is only 100M rows, it takes 1.8 GB on the disk, so I've used tmpfs for this which essentially is a ramdisk. But I have a gen4 pcie nvme SSD - it can reliably write at 4GB/s sequentially, so writing takes a ~500ms for 100M rows, it is not a bottleneck here. random() takes ~half of the insert time. Generating those values with Rust, for example, is faster, but sharing this data with sqlite takes more time than generating it with random().

Maybe implementing custom virtual table in C or Rust like build-in generate_series, but the one that will produce user table fields will be faster, but that is significantly more effort than my query.

This query with random() and generate_series executed in sqlite CLI takes whooping 8MB of the RAM, so you don't even have to close all Electron-based applications to run it on a computer with 8GB of RAM.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: