CouchDB on PHP Abstract or how I only want to store data

Friday, December 21. 2007

Mac CouchDb Logo

My first Podcast, yeah!

Early the a week I recorded a short introduction to CouchDB for PHP Abstract.

If you like to quote, or copy or anything (after all this is under a Creative Commons license), here’s the transcript:

CouchDB

CouchDB is a new database system that breaks with a lot of traditions. Prepare to be confused or even offended. While a lot is different from traditional data management systems, the core concepts should be familiar to you.

Try to forget, just for a moment, all you know about SQL, relations, replication and all simple and advanced techniques you use to solve your problems when it comes to data storage. Instead, remember the days when you were a beginning PHP developer (if you are a beginner, perfect). You probably set out and wrote a guestbook application for your homepage, just for fun. At a certain point, you were faced with saving your guestbook entries, so that a subsequent visitor would see them. And whether you read a book, an online tutorial or asked a friend or instructor, the probability that you were advised to store your data in files is quite high. So you merrily churned away dealing with files and all was well.

One day however, after writing something highly controversial, guestbook entries are coming in by the lot. And then it happens: All you get to see on your homepage is some garbage instead of a healthy discussion. Apparently, as your resident PHP guru explains, two of your visitors tried to submit an entry at the same time. You sit down and get things repaired eventually but you swear never to use files for storing data again and venture to find a solution.

You are told to use a database to solve your concurrency issues. And all of a sudden you are faced with set theory, relational algebra, database connections, SQL standards and differences and so on. But you hate files so much that you sit down to figure it all out thinking with anger: “I only want to store data.”

Or, as a professional, you are cheerful that you left those dark times far behind with your sophisticated Object-Relational-Mapping solution, only to see things falling apart or grinding to halt when significant workload hits your application. You are then back to step one: You only want to store data. If you had other difficulties using or scaling databases, good: Keep listening.

Documents

CouchDB stores data. A single data-record, that is your database row or the result of a JOIN, is stored in what CouchDB calls a document. A document is really just a container for whatever you want to store in it.

Documents are JSON objects. JSON stands for JavaScript Object Notation and is a very lightweight format to create objects from a string. Its origins lie in the JavaScript language, obviously, but it is no longer bound to it. All major languages can convert native objects into JSON and vice versa.

What does that mean for you? Instead of thinking about which bits and pieces of your data go in which tables so that your JOINs will be fast, you just convert your data objects into JSON and put them into CouchDB. That is all there is to it. You just store data.

Documents have a few interesting characteristics. For example, they are versioned. If you change a document you really create a new version. And if you make a mistake, you can simply go back to the previous version, or any version. The versioning allows recent versions of a document to be read while a new one is being added. No locking occurs. Another characteristic is that documents are stored with ACID compliance. That means that your data is safe with CouchDB.

Views

So we solved this problem. But “just storing data” is only adequate for the most simple applications. You usually want to be able to filter your data, that is to say: “Give me all the records that match a particular set of criteria.” That could be the top 10 postings on your blog or all the pictures that were uploaded to Flickr in the last hour, or all comments by Jeff. CouchDB lets you create a view to achieve this. You provide a view function that does the actual filtering. In this view function you can access all attributes and values in your document to decide if CouchDB should include a document into the view. That way you have very fine-grained control over which documents you are interested in for a particular view.

Now, your criteria to match records are likely to change over time. With traditional database systems you might need to restructure a lot of your tables and queries to get the desired result and performance. Often, you resist change because of the work involved, practically stopping valuable evolution of your application. In CouchDB you just write a new view function. That means a lot less headache and code changes when you are modifying your application.

REST API

All of CouchDB’s features are accessed through a REST API. That means documents and views are treated as resources that have a unique Uniform Resource Locator (URI), just like each web page has its own unique URI. And just like webpages, documents and views are accessed via HTTP, the standard protocol of the World Wide Web. You might know about GET and POST HTTP requests already; they are used to fetch and store documents in CouchDB. But HTTP defines a few more request types. Of interest to CouchDB are PUT to create new resources and DELETE to get rid of them.

Now, HTTP requests are pretty easy to do in PHP (or any language for that matter). PHP’s internal streams and file functions let use all of CouchDB. At the same time, all major frameworks include HTTP libraries that let you use CouchDB while hiding all the dirty details of HTTP.

Replication

One nifty feature CouchDB is specifically designed for is replication. With replication you can have any number of machines keep an up to date snapshot of your data. The machines do not need to be physically near each other and have fast connections between them. In fact they do not even need a permanent connection. It is perfectly fine to take an instance of CouchDB offline, work with it, and then have it replicate back and forth all changes to and from its peers. If conflicts occur, CouchDB resolves them automatically and flags affected documents so you can revisit the automatic decision.

CouchDB with its replication system and by being built on top of the Erlang/OTP platform takes full advantage of today’s multi-core and multi-machine setups. It is designed for concurrent and flexible access to a reliable data store.

CouchDB does not aim to replace relational databases and it does not aim to be the solution to every problem out there. It is just another tool worth considering.

Posted by Jan | Comments (2)

Comments

Display comments as (Linear | Threaded)

Thanks for the podcast+transcript :)

CouchDB sounds a lot like a version control system, doesn’t it?

About the license: CC by-nc-nd 3.0 leaves pretty much nothing in your "anything" though

#1 Hoàng Đức Hiếu (Homepage) on 2007-12-22 09:58 (Reply)

Heya Hoáng, version control is an easy application to do with CouchDB, but it can do a lot more. Check out my other posts about the topic in my archive

#1.1 Jan on 2007-12-22 10:11 (Reply)