Hello, this is Jan Lehnardt and you're visiting my blog. Thanks for stopping by.
plok — It reads like a blog, but it sounds harder!
This is a follow-up piece to an earlier rant:
We are hitting a few walls with a CouchDB deployment and both Damien and I are a bit puzzled. This posting tries to attract someone with a clue to help us out. Our problems might result from not understanding the documentation correctly, but with evidently inaccurate material, we stand little chance
Long story short: We’ve got it all sorted out.
Sam Ruby relayed a hint by “a Mozilla Developer”. Invoking Spidermonkey with the
-b parameter and a value of
1000000, we are able to keep the memory footprint constant. We haven’t measured how this impacts performance, though.
irc.freenode.org helped to clarify how
heart is supposed to work. We interpreted the documentation as
heart being a monitoring process that restarts the Erlang VM, when it crashes. That is not the case and totally wrong. Since
heart is started from the Erlang VM (it is a child process in the process hierarchy), it cannot start a new VM when the old one crashes because the OS wipes out all child process before they can do anything.
heart good for then? Apparently, the Erlang VM can potentially get stuck (tip o’ the head to noss). I don’t know how often and under what circumstances that happens (I guess it is seldom and rare), but it can happen. Heart is designed to to check the VM’s health every now and then and launch a utility programme that takes care of the application restart.
A side note, the minimum timeout that
heart allows for the Erlang VM to not respond to health checks is 11 seconds. The
heart man page clearly states the fact, but
heart behaves unintuitive when you specify, say, 10 seconds because you failed to differentiate between
<=. Instead of defaulting to the lowest possible (11) value, it assumes the default value of 60 seconds which makes testers (me) think, nothing happens at all. Now this is clearly a PEBKAC and RTFM-type of error, but to be frank, the fine manual is not very approachable and I decided to fall back to
heart.c to see how things actually work.
Noah Slater pimped the script that launches CouchDB in a way that, if you want to, CouchDB gets restarted automatically, in case the Erlang process dies. This is quite nice. Since CouchDB takes almost no time to restart, you have a nearly uninterrupted service. We also have
heart configured in a way that in case the Erlang VM gets stuck, it kills the VM process and nothing else. The launch script then detects that the process is gone and restarts it. This takes at least 11 seconds, as outlined above. If you need less, you need to hack
Thanks to all who sent in suggestions and words of help.