You were saying something about 'best intentions'?

Well, hotrepart is my most recent attempt at doing something interesting outside of work, and it took me down some interesting avenues.

I didn't figure that I'd be patching PL/Proxy, learning C, pointer arithletic and memory management along the way. It was interesting working in a bare text editor, but also much, much slower. It took me maybe 40 hours of screen-time and 20 hours of no-screen thinking, spread across about 3 months, to get a 1.1Kloc patch out the door.

Once I did this, I then realised that my plan to scale Postgres to the moon was fundamentally flawed: queries that had to be run across the whole dataset (RUN ON ALL in PL/Proxy parlance) would eventually saturate the cluster without a decent replication system. A decent replication system (that can be automatically installed and reconfigured on-the-fly at runtime) is of course the one thing Postgres lacks right now.

During this three months I also read up more on datacenter-scale applications and Google's concept of the Warehouse Scale Computer, which altered my thinking somewhat.


At this point I should explain, to myself if nobody else, why I started this whole experiment. The thinking was that the algorithms behind DHTs and distributed key-value stores like Dynamo should in theory be implementable using RDBMS installs as building blocks. What you lose in performance overhead you gain in query language expressivity. Going further, with a decent programmable proxy it should be possible to route DB requests just like DHTs route pull requests etc. Further still, a "proxy layer" on top of the DB layer should self-heal and use a Paxos algorithm to route around failures and update the routing table.

One of the properties of the hotrepart system as-is is that it is immediately consistent. In theory, the system proposed above would trade that off for eventual consistency, with a lag equal to the replication delay, but gaining partition-tolerance, on the assumption that the replication also had a hot-failover component. Hello, there, Brewer's conjecture.

Needless to say this would be a ridiculous amount of work to implement, even it it worked in theory. Were this to be made real, however, would such a system scale to the numbers obtained with Hadoop? That is the question.

One way to answer it would be to construct a simulation cluster processing requests and then to torture the cluster in various ways (kill hosts, partition the network) and watch what happens. We shall see.

The end result is that hotrepart is stalled for the moment, pending ideas on the direction in which to take it foward. As with Gradient, even if nothing comes of the project, it's still out there in the open. Four years after I stopped work on Gradient it still proved useful to other people because of the ideas alone, and combining XMPP and hypertext is something everyone's doing now, so I don't necessarily think I've wasted my time on this so far.

No comments: