This is a follow-up piece to an earlier rant:
We are hitting a few walls with a CouchDB deployment and both Damien and I are a bit puzzled. This posting tries to attract someone with a clue to help us out. Our problems might result from not understanding the documentation correctly, but with evidently inaccurate material, we stand little chance
Long story short: We’ve got it all sorted out.
Memory Hogging Spidermonkey
Sam Ruby relayed a hint by “a Mozilla Developer”. Invoking Spidermonkey with the -b
parameter and a value of 1000000
, we are able to keep the memory footprint constant. We haven’t measured how this impacts performance, though.
Crashing Erlang VM
#erlang
on irc.freenode.org
helped to clarify how heart
is supposed to work. We interpreted the documentation as heart
being a monitoring process that restarts the Erlang VM, when it crashes. That is not the case and totally wrong. Since heart
is started from the Erlang VM (it is a child process in the process hierarchy), it cannot start a new VM when the old one crashes because the OS wipes out all child process before they can do anything.
What is heart
good for then? Apparently, the Erlang VM can potentially get stuck (tip o’ the head to noss). I don’t know how often and under what circumstances that happens (I guess it is seldom and rare), but it can happen. Heart is designed to to check the VM’s health every now and then and launch a utility programme that takes care of the application restart.
A side note, the minimum timeout that heart
allows for the Erlang VM to not respond to health checks is 11 seconds. The heart
man page clearly states the fact, but heart
behaves unintuitive when you specify, say, 10 seconds because you failed to differentiate between <
and <=
. Instead of defaulting to the lowest possible (11) value, it assumes the default value of 60 seconds which makes testers (me) think, nothing happens at all. Now this is clearly a PEBKAC and RTFM-type of error, but to be frank, the fine manual is not very approachable and I decided to fall back to heart.c
to see how things actually work.
Automatically restarting CouchDB
Noah Slater pimped the script that launches CouchDB in a way that, if you want to, CouchDB gets restarted automatically, in case the Erlang process dies. This is quite nice. Since CouchDB takes almost no time to restart, you have a nearly uninterrupted service. We also have heart
configured in a way that in case the Erlang VM gets stuck, it kills the VM process and nothing else. The launch script then detects that the process is gone and restarts it. This takes at least 11 seconds, as outlined above. If you need less, you need to hack heart.c
.
Thanks to all who sent in suggestions and words of help.