2008/12/22

Notes on a small proxy, or: "PL/Proxy: lessons learnt, hints and tips"

PL/Proxy is the SQL proxy that Skype released as a part of their open source program. Although in some ways less flexible than MySQLProxy, focused as it is purely on partitioning and load-balancing, it's been used in production under HA circumstances and is definitely production quality code.

I installed it on a Linux image and found the following:

When installing from source:
  • On Debian-based systems (including Ubuntu) If you don't have pg_config, you need to install the package postgresql-server-dev-8.X
  • Make sure flex & bison are installed first.
  • Make sure you have the latest version. Previous versions rely on flex & bison being defined in pg_config, which is not always the case.
For the simple example, putting the passwords in the connection strings is a quick and dirty fix to getting up and running.

If the anything changes in your get_cluster_partitions function, then you should also increment the value returned by get_cluster_version. PL/Proxy caches the results of get_cluster_partitions to avoid calling it with every invocation, but on a per-version basis.

While the documentation says "the number of connections for a given cluster must be a power of 2", don't forget that 1 connection is also possible.

You can proxy to a proxy, but there are two things to gake into account:

1) Your reads can't use SELECTs inside your proxy functions, i.e. no "cluster A run on B select C", which is bad style anyway. Instead, your target databases have to have functions with a method signature identical to that of the function on the proxy.

2) It's not much use if you're using a stock PL/Proxy install that uses hashtext with identical cluster-choice logic on both proxies. Why? In this example scenario,

P1 -> part00,
+ -> P2 -> part01, part02

...Nothing will ever be written to p01 - P1 partitions the same way as P2.

According to the todo list, PL/Proxy currently loads response result sets into memory, which leads to some obvious limitations.

CONNECT doesn't take function arguments. This can be emulated by having one-DB clusters, but still.

"Tagged partitions" are not very clear in the docs, and neither is the reason for the restriction on having the number of partitions equal to a a power of 2, but the PL/Proxy FAQ explains this. PL/Proxy uses the lowest N bits of the hash to decide which partition to run on, and N bits has range of 2^N, hence the limitation.

At time of writing, the above FAQ isn't linked to from the Skype project homepage, neither is it in CVS or in the release, but it answers a lot of questions. The other place where a lot of questions are answered is the plproxy-users mailing list, where people have already asked most of the simple questions, and had them clearly answered by the project lead.

No comments: