CouchHack 2009 — Relaxing at Damien's

Monday, May 4. 2009

Wow, just, wow.

What

CouchHack is an open-for-all, multi-day CouchDB developer & user meetup. The first one took place in Asheville, NC at Damien’s. And boy it was fun, relaxing and productive all at the same time.

A total of six, including Damien, got together for four days of learning, hacking and goofing off.

Special thanks to HUDORA & Ben Young for sponsoring CouchHack.

Got Stuff Done

The concentration of developer knowledge and enthusiasm turned out to be a major boost for productivity (no surprise here).

Batch PUT

CouchDB comes with a bulk_docs API that lets you send batches of documents to be saved. bulk_docs is significantly faster than writing single documents to CouchDB. There are situations however where collecting a set of documents prior to saving them is not desirable or possible. Take a distributed logging system storing events into a central server.

To speed up this use-case, we came up with the idea of adding a write buffer to single document PUT and POST requests and Chris took the time to write the code. He went through different design strategies (single module, gen_server, ets tables and finally settled on pure Erlang lists for the buffer). When saved with a special query parameter, the PUT or POST request gets buffered for one second before being flushed to disk. In the event of a power failure, everything before the one-second sync might get lost. Be sure to use this only for data that you can afford to lose. Community review & commit are pending.

CouchDBX

I spent some time on CouchDBX. The first version of CouchDBX was hand-crafted and it was non-trivial to update single or even all components.

CouchDBX bundles CouchDB (of course), Erlang & Spidermonkey — all components required to run CouchDB as well as a small Cocoa application that lets you launch CouchDB with a double-click.

One other dependency, ICU (the IBM Components for Unicode) is not included. Including ICU would make the Downloadable package register at about 40 MB where the current one is around 12 MB.

Luckily ICU is bundled with Mac OS X. Unfortunately, Apple doesn’t ship ICU header files with the installation. WebKit does, though and with them it is possible to build CouchDB and link it against the built-in ICU library.

Only my autotools-fu is too weak to get it set up with a stock CouchDB. I tried a few times in the past and always got stuck at some point. Eventually, while setting up the script to do the downloading, building & packaging of all components, I added a “post install phase” where I would re-link the necessary libs against the built-in ICU library. This worked well.

We now have couchdbx-core-builder.sh, a script that can build any combination of Erlang & CouchDB that is known to work. It builds and packages all components and then strips out everything that is not strictly needed for CouchDB’s operation. This includes a great many Erlang standard libraries.

Finally, I updated the CouchDBX application bundle (I used Geoffrey Grosenbach’s version with the embedded WebKit view for Futon) and already released two versions.

CouchHack Live

As part of the CouchDB Podcast, we recorded an episode with everybody on one table. This is the first Podcast that features Damien, go check it out :)

Reduce Overflow

CouchDB’s Map/Reduce system is different from both the Google and Hadoop models. I’d say it is improved, but nevertheless, it is different. One common misconception for newcomers is that the reduce function has to reduce its input values. It cannot collect values. The rate of reduction is less or equal to log N where N is the number of input values per key.

Chris and Paul worked on a patch that warns the user when his or her reduce function returns too much data. Once committed, the warning will be on by default, but it can be overridden for the cases the user knows what he’s doing and has a scenario where he can get away with it. Usually though, a faulty reduce function will make view requests crawlingly slow. Community review & commit pending.

Relax

But it wasn’t that we worked a hard 20 hours a day. We also got together to have a good time, play Wii and havoc heli. Damien’s got an AirZooka that we used to knock down random things in the house.

It was a perfect combination of geeking out and hanging out and we all can’t wait to do the next one :)

Future CouchHacks

We hope to make CouchHack a distributed event that takes part all over the world wherever CouchDB hackers happen to be.