Hacker Timesnew | past | comments | ask | show | jobs | submitlogin
Pure Go implementation of D. J. Bernstein's cdb constant database library (github.com/jbarham)
65 points by SlimHop on May 29, 2012 | hide | past | favorite | 19 comments


What are people using this 'cdb' for, irl? Sounds interesting but this is the first i've heard of it.


It's useful if you have data that is easy to cache (i.e., rebuilt every 6 hours) but very commonly accessed. Because the lookups are so quick (two seeks) it's almost raw disk speed. But yeah, rebuilding the files is an offline process (build new file and swap it in using a rename), so your data has to be cache-friendly.

It's a good alternative to memcache if your data is larger than what memcached can support in RAM.

In the early 2000s I used it to implement most of the frontend for a PPC marketplace for search engines. Held up well. These days I'd just use memcached or redis.


It's a good alternative to memcache if your data is larger than what memcached can support in RAM.

Unless you are running memcached on an ec2 large instance (8gb) or bigger:

"No random limits: cdb can handle any database up to 4 gigabytes."


It's pretty easy to throw together a variant using longs for position instead of unsigned ints, the rest of the code stays the same. Slightly more overhead in the file but as long as the items you're storing are bigger than a few bytes it's not a huge deal.

Anyways, it's useful for stuff where you want to ship out a big dictionary once a day or so and you need fast lookup but it doesn't have to be updated transactionally.


I use it on my mail relay to determine whether a domain is local or not. As this check happens for every incoming RCPT TO (including all the bots checking to see whether I'm an open relay or not), I don't want to hit the database where the accounts are stored just yet.

This is the perfect case for a cdb file: the list of local domains changes very rarely, but lookups happen 100s of times per second.

Granted though, it's a very special case. Aside of this mail case, I've yet to find another real world application.


It's a persistent data structure optimized for static lookup tables. Bernstein's qmail and djbdns use it for mail routing and DNS zone data, respectively.


I'm curious to know how it compares to LevelDB (http://code.google.com/p/leveldb/). I'd love to see a detailed comparison between the two. LevelDB is mutable, but is built on top of an immutable key/value map: http://code.google.com/p/leveldb/source/browse/include/level...


LevelDB is a sorted string table; cdb is a hash table. Without knowing anything more about the implementation, I expect that arbitrary lookups are slower in LevelDB than cdb.


It's used extensively in djb's packages like qmail and ucspi-tcp. In the latter, it's used to maintain a database of blacklisted IPs which are blocked from connecting to a TCP server, among other things. http://cr.yp.to/cdb.html


I used it in combination with the public dumps from IMDB to make a quick on-disk lookup database for use with my favorite (albeit slower) scripting language python.

I didn't have the RAM for holding it all in memory and I was only ever going to do read lookups on it.

It made it very easy to query things like: Given my favorite set of movies, which actors appearing in them have also appeared in the Doctor Who television series?

Just for fun / learning :)


Incidentally, I've heard the IMDB site itself uses cdbs heavily to serve static content and data.


DJB's qmail (the security-hardened sendmail replacement) uses it.


For those interested, there are also implementations of cdb in java: http://www.strangegizmo.com/products/sg-cdb/ https://github.com/sunnygleason/g414-hash


Okay, semi-related:

Why is the speed of the Go compiler so important? Why not just use an incremental compiler? Why the hell would you want to recompile a 100k+ line program when there are known better alternatives?

Just doesn't make sense to me.


When your compiler is doing interprocedural optimizations like inlining (which Go's does), then changing an upstream dependency generally requires that all downstream dependencies be recompiled as well. So incremental recompilation isn't a panacea, and I think the Go designers made the right choice in striving to make compilation fast.

Of course, you can do something like incremental compilation only at -O0 with no inlining, which is what I suspect we'll end up doing in Rust (the relevant bug is [1]).

[1]: https://github.com/mozilla/rust/issues/2369


"The Go compiler isn't so fast; other compilers are just slow. Machines are bloody fast. Just don't piss the speed away." -Rob Pike http://twitter.com/rob_pike/status/199620997459087360

Why not do both? I appreciate that Go respects my time.


When we launched Go we made a big deal about compilation time because we were trying to appeal to C++ programmers. In large C++ projects it is not unusual to wait tens of minutes or even hours for projects to build. Today, the largest Go projects take mere seconds to build.

It turns out that way more than the C++ set were interested in Go. Naturally, people familiar with environments where compilation speed is a non-issue were mystified by our enthusiasm for Go's low compilation times.


As already pointed out by others, incremental compilation does not work all the time. And as proved by Git, some speed advancements really change the game.


incremental compilation is not foolproof either. but the q is, whether compilation speed even matters. your typical c project will spend much time in make (semi-related (to the topic and make): use redo). and if you compile with -O0 your compiler will not be the bottleneck for most things.

go seems to be nice, but compilation speed is no real argument!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: