2008/12/22

Amazon redux

Y'know, despite all my bloviation over the wonderfulness of AWS, I would be remiss if I failed to mention that Amazon have taken a dent in my estimation recently due to this story by the Times of London documenting work practices which appear to be if not illegal, then at the very least inhumane. I would be surprised if being threatened with dismissal for more than 6 days of illness is acceptable behaviour under UK employment law.

Even though working on AWS would be a dream job, I'm writing this anyway because I don't believe in candy-coating the nature of 'the beast'. Just like individual people, groups of people also have facets of their collective nature that should be lauded, or condemned, and hopefully redeemed.

Not sure if the next batch of presents will come from Amazon.co.uk though.

Update 1 Feb. 2009. The Sunday Times badly misrepresented Google's energy efficiency a while ago, so also took a dent in my estimation of their credibility, so who knows about the above? Maybe it's an equally manufactured controversy.

Notes on a small proxy, or: "PL/Proxy: lessons learnt, hints and tips"

PL/Proxy is the SQL proxy that Skype released as a part of their open source program. Although in some ways less flexible than MySQLProxy, focused as it is purely on partitioning and load-balancing, it's been used in production under HA circumstances and is definitely production quality code.

I installed it on a Linux image and found the following:

When installing from source:
  • On Debian-based systems (including Ubuntu) If you don't have pg_config, you need to install the package postgresql-server-dev-8.X
  • Make sure flex & bison are installed first.
  • Make sure you have the latest version. Previous versions rely on flex & bison being defined in pg_config, which is not always the case.
For the simple example, putting the passwords in the connection strings is a quick and dirty fix to getting up and running.

If the anything changes in your get_cluster_partitions function, then you should also increment the value returned by get_cluster_version. PL/Proxy caches the results of get_cluster_partitions to avoid calling it with every invocation, but on a per-version basis.

While the documentation says "the number of connections for a given cluster must be a power of 2", don't forget that 1 connection is also possible.

You can proxy to a proxy, but there are two things to gake into account:

1) Your reads can't use SELECTs inside your proxy functions, i.e. no "cluster A run on B select C", which is bad style anyway. Instead, your target databases have to have functions with a method signature identical to that of the function on the proxy.

2) It's not much use if you're using a stock PL/Proxy install that uses hashtext with identical cluster-choice logic on both proxies. Why? In this example scenario,

P1 -> part00,
+ -> P2 -> part01, part02

...Nothing will ever be written to p01 - P1 partitions the same way as P2.

According to the todo list, PL/Proxy currently loads response result sets into memory, which leads to some obvious limitations.

CONNECT doesn't take function arguments. This can be emulated by having one-DB clusters, but still.

"Tagged partitions" are not very clear in the docs, and neither is the reason for the restriction on having the number of partitions equal to a a power of 2, but the PL/Proxy FAQ explains this. PL/Proxy uses the lowest N bits of the hash to decide which partition to run on, and N bits has range of 2^N, hence the limitation.

At time of writing, the above FAQ isn't linked to from the Skype project homepage, neither is it in CVS or in the release, but it answers a lot of questions. The other place where a lot of questions are answered is the plproxy-users mailing list, where people have already asked most of the simple questions, and had them clearly answered by the project lead.