2010/05/19

Jevons paradox, Moore's law and utility computing

TL;DR: A quirk of economics may explain why computers have always seemed too slow, and could indicate that a utility computing boom will take place in the near future.

First: Jevons paradox...

...is defined as follows on Wikipedia:
The proposition that technological progress that increases the efficiency with which a resource is used, tends to increase (rather than decrease) the rate of consumption of that resource.
This was first observed by Jevons himself with coal:
Watt's innovations made coal a more cost effective power source, leading to the increased use of the steam engine in a wide range of industries. This in turn increased total coal consumption, even as the amount of coal required for any particular application fell.
I think that we're experiencing this effect with CPU time and storage. Specifically, if we recast Moore's law in terms of increased efficiency of CPU instruction processing per dollar, then Jevons paradox explains why software generally never seems to get any faster, and why we always seem to be running out of storage space.

This is backed up by anecdotal evidence and folk knowledge. Consider generally accepted adages such as Wirth's law, "Software is getting slower more rapidly than hardware becomes faster", or the variations of Parkinsons law, such as "Data expands to fill the space available for storage", and Nathan Myhrvold's 'first law of software', to paraphrase: "Software is a gas: it always expands to fit whatever container it is stored in."

To provide more empirical proof, one would have to be able to take a survey of data like "historical average time taken by a user to perform common operation X", or "average % disk space free since 1990 on home/office PCs". I'd be interested in any links to studies similar to this, if anyone has any examples.

Second: the commoditisation of CPU time and storage

Generally, clients do not seem to be getting 'thicker', or pushing boundaries in terms of CPU power and storage. Mean-while, many companies are spending a lot of money on new datacenters, and the concept of "Big Data" is gaining ground. Server-side, we've moved to multi-core and we're learning more about running large clusters of cheap hardware.

Right now we're at the inflection point in a movement towards large datacenters that use economies of scale and good engineering to attain high efficiencies. When this infrastructure is exposed at a low level, its use is metered in ways similar to other utilities such as electricity and water. EC2 is the canonical example.

I believe that CPU time and storage will eventually become a true commodity, traded on open markets, like coal, oil, or in some regions of the world, electricity. The barriers to this happening today are many: lack of standard APIs and bandwidth- or network-related lock-in are two worth mentioning. However, you can see foreshadowing of this in Amazon's spot instances feature, which uses a model close to how real commodities are priced. (Aside: Jonathan Schwartz was posting about this back in 2005.)

In an open market, building a 'compute power station' becomes an investment whose return on capital would be linked to the price of CPU time & storage in that market. The laws of supply and demand would govern this price as it does any other. For example, one can imagine that just as CO2 emissions dip during an economic downturn, CPU time would be also cheaper as less work is done.

In addition to this, if Moore's law continues to hold, newer facilities would host ever-faster, ever-more-efficient hardware. This would normally push the price of CPU time inexorably downwards, and make investing in any datacenter a bad idea. As a counter-point to this, we can see that todays hosting market is relatively liquid on a smaller scale, and people still build normal datacenters. Applying Jevons paradox, however, goes further, indicating that as efficiency increases, demand will also increase. Software expands to fill the space available.

Third: looking back

I think a closer look at recent history will help to shed light on the coming market in utility computing. Two subjects in particular might be useful to study.

Coal was, in Britain, a major fuel of the last industrial revolution. From 1700 to 1960, production volume increased exponentially [PDF, page 34] and the 'consumer' price fell in real terms by 40%, mainly due to decreased taxes and transportation costs. At the same time, however, production prices rose by 20%. In his book The Coal Question, Jevons posed a question about coal that we may find familiar: for how long can supply continue to increase exponentially?

However, the parallels only go so far. Coal mining technology only progressed iteratively, with nothing like Moore's law behind it - coal production did peak eventually. The mines were controlled by a cartel, the "Grand Allies", who kept production prices relatively stable by limiting over-production. Today we have anti-trust laws to prevent that from happening.

Lastly, the cost structure of the market was different: production costs were never more than 50% of the consumer price, whereas the only cost between the producer and the consumer of CPU time is bandwidth. Bandwidth is getting cheaper all the time, although maybe not at a rate sufficient for it to remain practically negligible throughout an exponential increase in demand for utility computing.

Electricity, as Jon Schwartz noted in the blog posts linked to above, and as Michael Manos noted here, started out as the product of small, specialised plants and generators in the basements of large buildings, before transforming rapidly into the current grid system, complete with spot prices etc. Giants like GE were created in the process.

As with electricity, it makes sense to use CPU power as close to you as possible. For electricity there are engineering limits concerning long distance power transmission; on networks, latency increases the further away you go. There are additional human constraints. Many businesses prefer to deal with suppliers in their own jurisdictions for tax reasons and for easier access to legal recourse. Plenty of valuable data may not leave its country of origin. For example, a medical institution may not be permitted to transfer its patient data abroad.

For me, this indicates that there would be room at a national level for growth in utility computing. Regional players may spring up, differentiating themselves simply through their jurisdiction or proximity to population centers. To enable rapid global build-out, players could set up franchise operations, albeit with startup costs and knowledge-bases a world away from the typical retail applications of the business model.

Just as with the datacenter construction business, building out the fledgling electricity grid was capital intensive. Thomas Edison's company was allied with the richest and most powerful financier of the time, J.P. Morgan, and grew to become General Electric. In contrast, George Westinghouse, who built his electricity company on credit and arguably managed it better, didn't have Wall St. on his side and so lost control of his company in an economic crisis.

Finally: the questions this leaves us with

It's interesting to note that two of the companies that are currently ahead in utility computing - Google and Microsoft - have sizable reserves measured in billions. With that kind of cash, external capital isn't necessary as it was to GE and Westinghouse. But neither of them, nor any other player, seem to be building datacenters at a rate comparable to that of power station construction during the electrifying of America. Are their current rates going to take off in future? If so, how will they finance it?

Should investors pour big money into building utility datacenters? How entrenched is the traditional hosting business, and will someone eat their lunch by diving into this? Clay Shirky compared a normal web-hosting environment to one run by AT&T - will a traditional hosting business have the same reaction as AT&T did to him?

A related question is the size of the first-mover advantage - are Google, Amazon, Microsoft and Rackspace certain to dominate this business? I think this depends on how much lock-in they can create. Will the market start demanding standard APIs and fighting against lock-in, and if so, when? Looking at the adoption curves of other technologies, like the Web, should help to answer this question. Right now the de-facto standard APIs such as those of EC2 and Rackspace can be easily cloned, but will this change?

I'm throwing these questions out here because I don't have the answers. But the biggest conclusion that I think can be tentatively drawn from applying Jevons paradox to the 'resources' of CPU time and storage space, is that in the coming era of utility computing, there may soon be a business case to be made for building high-efficiency datacenters and exposing them to the world at spot prices from day one.