Pure Go implementation of D. J. Bernstein's cdb constant database library

cypherpunks01 · on May 29, 2012

What are people using this 'cdb' for, irl? Sounds interesting but this is the first i've heard of it.

tlack · on May 29, 2012

It's useful if you have data that is easy to cache (i.e., rebuilt every 6 hours) but very commonly accessed. Because the lookups are so quick (two seeks) it's almost raw disk speed. But yeah, rebuilding the files is an offline process (build new file and swap it in using a rename), so your data has to be cache-friendly.

It's a good alternative to memcache if your data is larger than what memcached can support in RAM.

In the early 2000s I used it to implement most of the frontend for a PPC marketplace for search engines. Held up well. These days I'd just use memcached or redis.

yummyfajitas · on May 29, 2012

It's a good alternative to memcache if your data is larger than what memcached can support in RAM.

Unless you are running memcached on an ec2 large instance (8gb) or bigger:

"No random limits: cdb can handle any database up to 4 gigabytes."

jbooth · on May 29, 2012

It's pretty easy to throw together a variant using longs for position instead of unsigned ints, the rest of the code stays the same. Slightly more overhead in the file but as long as the items you're storing are bigger than a few bytes it's not a huge deal.

Anyways, it's useful for stuff where you want to ship out a big dictionary once a day or so and you need fast lookup but it doesn't have to be updated transactionally.

pilif · on May 29, 2012

I use it on my mail relay to determine whether a domain is local or not. As this check happens for every incoming RCPT TO (including all the bots checking to see whether I'm an open relay or not), I don't want to hit the database where the accounts are stored just yet.

This is the perfect case for a cdb file: the list of local domains changes very rarely, but lookups happen 100s of times per second.

Granted though, it's a very special case. Aside of this mail case, I've yet to find another real world application.

tptacek · on May 29, 2012

It's a persistent data structure optimized for static lookup tables. Bernstein's qmail and djbdns use it for mail routing and DNS zone data, respectively.

haberman · on May 29, 2012

I'm curious to know how it compares to LevelDB (http://code.google.com/p/leveldb/). I'd love to see a detailed comparison between the two. LevelDB is mutable, but is built on top of an immutable key/value map: http://code.google.com/p/leveldb/source/browse/include/level...

alec · on May 29, 2012

LevelDB is a sorted string table; cdb is a hash table. Without knowing anything more about the implementation, I expect that arbitrary lookups are slower in LevelDB than cdb.

dpkendal · on May 29, 2012

It's used extensively in djb's packages like qmail and ucspi-tcp. In the latter, it's used to maintain a database of blacklisted IPs which are blocked from connecting to a TCP server, among other things. http://cr.yp.to/cdb.html

pudquick · on May 29, 2012

I used it in combination with the public dumps from IMDB to make a quick on-disk lookup database for use with my favorite (albeit slower) scripting language python.

I didn't have the RAM for holding it all in memory and I was only ever going to do read lookups on it.

It made it very easy to query things like: Given my favorite set of movies, which actors appearing in them have also appeared in the Doctor Who television series?

Just for fun / learning :)

sigil · on May 29, 2012

Incidentally, I've heard the IMDB site itself uses cdbs heavily to serve static content and data.

pcwalton · on May 29, 2012

DJB's qmail (the security-hardened sendmail replacement) uses it.

hncommenter13 · on May 29, 2012

For those interested, there are also implementations of cdb in java: http://www.strangegizmo.com/products/sg-cdb/ https://github.com/sunnygleason/g414-hash

gringomorcego · on May 29, 2012

Okay, semi-related:

Why is the speed of the Go compiler so important? Why not just use an incremental compiler? Why the hell would you want to recompile a 100k+ line program when there are known better alternatives?

Just doesn't make sense to me.

pcwalton · on May 29, 2012

When your compiler is doing interprocedural optimizations like inlining (which Go's does), then changing an upstream dependency generally requires that all downstream dependencies be recompiled as well. So incremental recompilation isn't a panacea, and I think the Go designers made the right choice in striving to make compilation fast.

Of course, you can do something like incremental compilation only at -O0 with no inlining, which is what I suspect we'll end up doing in Rust (the relevant bug is [1]).

[1]: https://github.com/mozilla/rust/issues/2369

rbright · on May 29, 2012

"The Go compiler isn't so fast; other compilers are just slow. Machines are bloody fast. Just don't piss the speed away." -Rob Pike http://twitter.com/rob_pike/status/199620997459087360

Why not do both? I appreciate that Go respects my time.

enneff · on May 29, 2012

When we launched Go we made a big deal about compilation time because we were trying to appeal to C++ programmers. In large C++ projects it is not unusual to wait tens of minutes or even hours for projects to build. Today, the largest Go projects take mere seconds to build.

It turns out that way more than the C++ set were interested in Go. Naturally, people familiar with environments where compilation speed is a non-issue were mystified by our enthusiasm for Go's low compilation times.

zoul · on May 29, 2012

As already pointed out by others, incremental compilation does not work all the time. And as proved by Git, some speed advancements really change the game.

ibotty · on May 29, 2012

incremental compilation is not foolproof either. but the q is, whether compilation speed even matters. your typical c project will spend much time in make (semi-related (to the topic and make): use redo). and if you compile with -O0 your compiler will not be the bottleneck for most things.

go seems to be nice, but compilation speed is no real argument!