restdb: A NoSQL JavaScript Database

I’ve thought for awhile how much fun it would be to make and/or hack on a JavaScript database. After seeing the implementation of an append-only B+ tree in JavaScript for a CouchDB clone I thought, what the heck, why not? So for my company’s hackday I thought it would be a great way for me to expand my knowledge and grow in new ways.

The Why and What

The real bottlenecks in distributed databases are network and storage, not the speed of the language. As long as the database is architected well, handles concurrency well, and is able to scale, then it should be perfectly fine to write it in any language. JavaScript and Node.js seem suited particularly well because of its ability to deal well with concurrency, IO, and many connections.

Using a dynamo-styled setup that can expand a ring of gossip-protocol-linked nodes should work well with node. Allowing socket-based connections as well as RESTful interfaces would work easily. Providing real-time websocket updates to web apps would allow for both a great web-interface into the health and load of the database as well as direct and instant updates to web apps relying on the data.

The format of this database is key-value. It stores JSON documents, but handles any other types of document too, from HTML to binary and images. They key for each document is a URI. This allows the organization of documents and related types (e.g. /tweets/1234 and /users/123/comments/1234). The database needs to be able to create custom indexes to sort documents in custom ways. And a mechanism needs to be in place to create and update those indexes and access them via a defined URI. Being inspired by CouchDB’s design documents, the indexes and other possible custom data handling (think document validation, triggers or callbacks, and refined ACLs/permissions) could be taken care of by creating special JSON documents called “apps”. These will be prefixed with an underscore to differentiate them from other documents.

The Parts

Rather than using a node B-tree I opted to use Google’s already stable leveldb. Node.js has a module with bindings for leveldb, so this provides a great start but it doesn’t support streams. This is a negative because the whole document is loaded into memory by the leveldb module. If it supported streams we could send the data along as we received it from the file system and keep memory low, especially relavent when storing large files such as pictures and video.

I’m using express for the REST server. This may be more than I need for this very specific use-case, but I know it well and can drop down to node’s HTTP stack sans-express if needed.

There is a gossip protocol module which is a start for our dynamo style ring, but there is a lot more work to be done on coordinating where on the ring documents are stored. I’m thinking it can work like Riak where you specify a number of vhosts for the ring and each vhost would be a separate database and separate node process. Leveldb only allows one process to be connected at a time, to prevent contention. Map-reduce queries would be spread around to each vhost. And the vhosts would be spread around the cluster/ring as it is grown. I’ve also included an mdns module, or more commonly known as zero-conf or bonjour, so that the nodes on a network can automatically detect each other and auto-join a ring possibly, depending on a configuration value.

I’ll use the msgpack module to serialize and store the JSON to leveldb. This will make it faster to pull the data off disk and into memory for indexing and map-reduce, and keeps the database smaller. The REST interface could support JSON and msgpack and then we could use the msgpack protocol in addition to the REST interface, allowing for connecting clients the choice to work in the smaller/faster binary format of msgpack.

The Result: restdb

I created and called this project restdb, since it is a database very much based on the REST protocol. I was only able to get the initial parts working today. This was mostly because I forgot to use –data-binary with curl for testing and spent a long time figuring out why my image wasn’t displaying in the browser. But you may install and run the database if you’ve got node.js installed by following the instructions in the readme.

You can do PUTs, GETs, and DELETEs. And it works well. The next steps would be to handle JSON correctly with msgpack. And to give support for the specialized app documents.

Each index created will be a separate database that stores the index field as the key and the document id as the value. This might cause problems as each vhost would need to have an additional database to its own for each index created. CouchDB deals with similar problems as it has a b-tree file for each database and for each index (view) in each database. It recommends increasing the ulimit on your machine. ElasticSearch has similar requirements. So I figure we’re not in too poor of company.

Future of restdb

It was a lot of fun to think about how I would build a database. It turned out to be able to be an application server as well as a database since using the app documents you could host complete web apps from the database. This is like the CouchApp, but I feel it supports it from the ground up rather than a bolt-on or pivot afterwards.

I don’t feel confident in my ability to write a production-ready, mature, or stable database. So I will continue to work on this whenever I get the itch to have some fun hacking, but it is not something I will commit any large amount of time to. However, if you are also interested in hacking JavaScript and want something meatier than a webapp to sink your teeth into, please fork and contribute to restdb. It has some great potential.

My experience with Node.js

I love node.js, but I didn’t always.

Initial Impression

When I first read about node.js I thought, “you’ve got to be kidding me. Javascript is slow and user-interface-oriented. The only reason you’d use it on the server is because you don’t know any other languages.” And I moved on.

Recommendation from a Friend

The next time I heard about it, my friend Derek Andriesian said he was using node for some of his applications. I knew Derek had experience in PHP, Ruby, and other languages, so I asked a little more why he would go with Javascript. Apparently Derek liked the event model it used. He said it was pretty cool controlling the flow of the request as it came in and went through the system. He also said to use express.js, an HTTP server for node. I still didn’t touch it.

Javascript Inundation

At my day job I started doing Javascript, client side, a lot. It became my daily grind. But I had been working in ActionScript for so long, not having proper class syntax (my opinion), packages, and imports was a major stumbling block. I had managed to get a very large jquery file, and even after breaking it apart into multiple files organized by app section, it was very hard to make changes to anything, and to share data across the sections. I needed more organization. I separated out much of the data code and more of the jquery code, but then I had a problem of having more than 20 Javascript files to deal with, then 30, then more, and ensuring they were in the correct order, and updating my build script whenever I added on, and tweaking the script when I got the new file in the wrong location. Dependency management would be really nice at this point. I found CommonJS, a standards body for dealing with more complex Javascript. They had a standard around “modules” for Javascript. I implemented it and built a build script that traversed the require statements and pulled everything into one script. Things were much better.

“Getting” Node.js

After that, I looked into node more. I knew how modules worked from my client-side journey. And I “got” what node.js is all about. Node is about handling many connections at once, without dealing with threads. Node is about simplicity for simple servers, and flexibility for complex ones. In node, I can return the results of a web request to the client and then continue to run some code (such as sending an email or logging information that isn’t important to the result). I can use function closures to hold data for connections when using websockets or regular sockets. Node isn’t just Javascript on the server for people who don’t know other languages. It’s a different paradigm for programming server-side applications. And

sildenafil reviewthere cenforceproscar

it’s actually pretty fast, for Javascript, because it uses some of the most advanced run-time technology around with Google’s V8 engine.

It’s Just Fun

It’s just been fun to develop in node. There are a ton of libraries available and more coming all the time. Creating REST APIs is a breeze with express.js. There is a lot of excitement and activity around node (some may call it hype, but whatever it is, it’s furthering the platform). There are some who are vehemently against node, but I haven’t read anything from anyone that was accurate and a good reason for not using node. If you’re a poor programmer, you’ll write poor programs in node same as in PHP. It isn’t a magic bullet. But if you can write code, node is a lot of fun. I also feel my code is easier to change and read, but I’ve also been writing heavy Javascript for 2 years now, so take it with a grain. Getting Started There are plenty of guides and tutorials on getting started. Be sure to check out npm too if you don’t manage to come across it in the basic guides, the node module package manager (like ruby’s gems). After than you can check out express.js a great web server for building apps and REST APIs, node.js cluster which launches several processes for each processor core to maximize performance, that gives you websockets (or alternatives for browsers that don’t support) for realtime data streaming, Redis a key-value store which node devs seem to like a lot, and mongodb a document database which also seems to be very popular for node.js users. CouchDB and Riak are also two databases that work well with node because of their Javascript orientation. And elasticsearch is a nice search service that works well via a REST API.