restdb: A NoSQL JavaScript Database

I’ve thought for awhile how much fun it would be to make and/or hack on a JavaScript database. After seeing the implementation of an append-only B+ tree in JavaScript for a CouchDB clone I thought, what the heck, why not? So for my company’s hackday I thought it would be a great way for me to expand my knowledge and grow in new ways.

The Why and What

The real bottlenecks in distributed databases are network and storage, not the speed of the language. As long as the database is architected well, handles concurrency well, and is able to scale, then it should be perfectly fine to write it in any language. JavaScript and Node.js seem suited particularly well because of its ability to deal well with concurrency, IO, and many connections.

Using a dynamo-styled setup that can expand a ring of gossip-protocol-linked nodes should work well with node. Allowing socket-based connections as well as RESTful interfaces would work easily. Providing real-time websocket updates to web apps would allow for both a great web-interface into the health and load of the database as well as direct and instant updates to web apps relying on the data.

The format of this database is key-value. It stores JSON documents, but handles any other types of document too, from HTML to binary and images. They key for each document is a URI. This allows the organization of documents and related types (e.g. /tweets/1234 and /users/123/comments/1234). The database needs to be able to create custom indexes to sort documents in custom ways. And a mechanism needs to be in place to create and update those indexes and access them via a defined URI. Being inspired by CouchDB’s design documents, the indexes and other possible custom data handling (think document validation, triggers or callbacks, and refined ACLs/permissions) could be taken care of by creating special JSON documents called “apps”. These will be prefixed with an underscore to differentiate them from other documents.

The Parts

Rather than using a node B-tree I opted to use Google’s already stable leveldb. Node.js has a module with bindings for leveldb, so this provides a great start but it doesn’t support streams. This is a negative because the whole document is loaded into memory by the leveldb module. If it supported streams we could send the data along as we received it from the file system and keep memory low, especially relavent when storing large files such as pictures and video.

I’m using express for the REST server. This may be more than I need for this very specific use-case, but I know it well and can drop down to node’s HTTP stack sans-express if needed.

There is a gossip protocol module which is a start for our dynamo style ring, but there is a lot more work to be done on coordinating where on the ring documents are stored. I’m thinking it can work like Riak where you specify a number of vhosts for the ring and each vhost would be a separate database and separate node process. Leveldb only allows one process to be connected at a time, to prevent contention. Map-reduce queries would be spread around to each vhost. And the vhosts would be spread around the cluster/ring as it is grown. I’ve also included an mdns module, or more commonly known as zero-conf or bonjour, so that the nodes on a network can automatically detect each other and auto-join a ring possibly, depending on a configuration value.

I’ll use the msgpack module to serialize and store the JSON to leveldb. This will make it faster to pull the data off disk and into memory for indexing and map-reduce, and keeps the database smaller. The REST interface could support JSON and msgpack and then we could use the msgpack protocol in addition to the REST interface, allowing for connecting clients the choice to work in the smaller/faster binary format of msgpack.

The Result: restdb

I created and called this project restdb, since it is a database very much based on the REST protocol. I was only able to get the initial parts working today. This was mostly because I forgot to use –data-binary with curl for testing and spent a long time figuring out why my image wasn’t displaying in the browser. But you may install and run the database if you’ve got node.js installed by following the instructions in the readme.

You can do PUTs, GETs, and DELETEs. And it works well. The next steps would be to handle JSON correctly with msgpack. And to give support for the specialized app documents.

Each index created will be a separate database that stores the index field as the key and the document id as the value. This might cause problems as each vhost would need to have an additional database to its own for each index created. CouchDB deals with similar problems as it has a b-tree file for each database and for each index (view) in each database. It recommends increasing the ulimit on your machine. ElasticSearch has similar requirements. So I figure we’re not in too poor of company.

Future of restdb

It was a lot of fun to think about how I would build a database. It turned out to be able to be an application server as well as a database since using the app documents you could host complete web apps from the database. This is like the CouchApp, but I feel it supports it from the ground up rather than a bolt-on or pivot afterwards.

I don’t feel confident in my ability to write a production-ready, mature, or stable database. So I will continue to work on this whenever I get the itch to have some fun hacking, but it is not something I will commit any large amount of time to. However, if you are also interested in hacking JavaScript and want something meatier than a webapp to sink your teeth into, please fork and contribute to restdb. It has some great potential.