Hello, this is Jan Lehnardt and you're visiting my blog. Thanks for stopping by.
plok — It reads like a blog, but it sounds harder!
Wow, just, wow.
A total of six, including Damien, got together for four days of learning, hacking and goofing off.
The concentration of developer knowledge and enthusiasm turned out to be a major boost for productivity (no surprise here).
CouchDB comes with a
bulk_docs API that lets you send batches of documents to be saved.
bulk_docs is significantly faster than writing single documents to CouchDB. There are situations however where collecting a set of documents prior to saving them is not desirable or possible. Take a distributed logging system storing events into a central server.
To speed up this use-case, we came up with the idea of adding a write buffer to single document PUT and POST requests and Chris took the time to write the code. He went through different design strategies (single module,
gen_server, ets tables and finally settled on pure Erlang lists for the buffer). When saved with a special query parameter, the PUT or POST request gets buffered for one second before being flushed to disk. In the event of a power failure, everything before the one-second sync might get lost. Be sure to use this only for data that you can afford to lose. Community review & commit are pending.
I spent some time on CouchDBX. The first version of CouchDBX was hand-crafted and it was non-trivial to update single or even all components.
CouchDBX bundles CouchDB (of course), Erlang & Spidermonkey — all components required to run CouchDB as well as a small Cocoa application that lets you launch CouchDB with a double-click.
One other dependency, ICU (the IBM Components for Unicode) is not included. Including ICU would make the Downloadable package register at about 40 MB where the current one is around 12 MB.
Luckily ICU is bundled with Mac OS X. Unfortunately, Apple doesn’t ship ICU header files with the installation. WebKit does, though and with them it is possible to build CouchDB and link it against the built-in ICU library.
Only my autotools-fu is too weak to get it set up with a stock CouchDB. I tried a few times in the past and always got stuck at some point. Eventually, while setting up the script to do the downloading, building & packaging of all components, I added a “post install phase” where I would re-link the necessary libs against the built-in ICU library. This worked well.
We now have
couchdbx-core-builder.sh, a script that can build any combination of Erlang & CouchDB that is known to work. It builds and packages all components and then strips out everything that is not strictly needed for CouchDB’s operation. This includes a great many Erlang standard libraries.
CouchDB’s Map/Reduce system is different from both the Google and Hadoop models. I’d say it is improved, but nevertheless, it is different. One common misconception for newcomers is that the
reduce function has to reduce its input values. It cannot collect values. The rate of reduction is less or equal to
log N where
N is the number of input values per key.
Chris and Paul worked on a patch that warns the user when his or her
reduce function returns too much data. Once committed, the warning will be on by default, but it can be overridden for the cases the user knows what he’s doing and has a scenario where he can get away with it. Usually though, a faulty
reduce function will make view requests crawlingly slow. Community review & commit pending.
But it wasn’t that we worked a hard 20 hours a day. We also got together to have a good time, play Wii and havoc heli. Damien’s got an AirZooka that we used to knock down random things in the house.
It was a perfect combination of geeking out and hanging out and we all can’t wait to do the next one :)
We hope to make CouchHack a distributed event that takes part all over the world wherever CouchDB hackers happen to be.
I hope to set up a CouchHack in Berlin soonish, I keep you updated :)
If you want to open your own CouchHack, feel free to add it to the wiki.