2010/12/13

Idea: scripting for the rest of us.

Rhino has had continuations for a long time now. JavaScript is the language of the web. However, long-running executions are generally only discussed in the context of business processes, whereas in fact anything can be a long-running process.

For example, against an API for a hypothetical group todo list/document review tool (getActions, followupActions, review, distribute), a regular committee or meeting can be run like this:
var minutes;
while( ! disbanded) {
var oldActions = followupActions(minutes == null ? null : minutes.actions);
var newActions = getActions(participants);
minutes = {actions: newActions.join(oldActions)};
var accepted = false;
do {

accepted = review(minutes);
} while( ! accepted);
distribute(minutes);
sleep(1*month);
}

This could be for a school board, church council, or whatever. Also, you can use this format to (e.g.) organise rotas, timetables, casual chess tournaments, games of hide-and-seek or anything else.

In practise the API would be far more general than the one posited above, and the four methods above would themselves be functions.

I think that you could use this mechanism to do games, interesting projects or social activities with the Facebook API, for example. You could publish scripts, and scripts of scripts or libraries of functions, oh my! All kinds of interesting things. Anyway, I made some mockups of a potential iPhone application using this tool since it's probably the closest I'll ever get to actually building something like this. This is what would happen when you load a script (the user is responding to an invitation and has just clicked on "view source" to see what s/he is being invited to participate in):

And this is what happens when running a script (likewise the user has clicked/expanded "you are here"):

Would be interesting, no?

2010/12/11

Idea: decentralised P2P website mirroring

The recent case of Wikileaks being booted off AWS provoked the following thought: if lots of people started mirroring wikileaks on EC2, Amazon would be forced into playing whack-a-mole to stop it. Game over: the revolt of a user-base, the inevitable collateral damage, etc. leads to bad PR and (hopefully) reform and vertebrates. Amazon's nice GUI combined with their introductory offer making a micro instance basically free for a year, means that putting up a how-to page/YouTube video showing every college kid with a credit card how to do it can't be too difficult.

My question is, what next? How to you turn a swarm of small, transient mirrors into something findable and load-balanced to deal with the (potentially) huge demand of serving Wikileaks traffic?

Therein lies an interesting problem. First and foremost the question of how you resolve a stable domain to one of a set of highly dynamic addresses is a difficult one. Round-robin DNS load-balancing with a very short TTL is one obvious approach, but this begs the question of how one boot-straps and then maintains the CNAME record containing the list of mirrors. The stable domain could be a CNAME to DNS servers that are themselves part of the mirroring swarm, if each node acted as both a DNS and an HTTP server. Each node would return a randomised list of the nodes in it's topological neighborhood.

Apart from that, I'm not much further along in my thinking. Posadis could be used to implement a simple DNS server - the sample code can practically be copy-pasted by anyone who knows a bit of C++. The P2P network should be easy to do - any DHT should be usable, provided it has an API call to get the list of nodes, since the goal is not really to store information but just to maintain a single cluster of machines that know about each other. But the boot-strapping problem remains.

2010/12/08

Thoughts on WikiLeaks

  1. We should be more outraged at the malfeasance revealed than at the manner in which we learnt of it. That we aren't, speaks to our cynicism and the low-to-nonexistent standards to which we hold government.

  2. It was EveryDNS, not EasyDNS.

  3. Much as I love Amazon AWS (see many, many previous posts), their justification for booting out WL is disappointing, because (a) the cause (Lieberman's call) and effect is there for all to see, and (b) they didn't boot WL off because of the Iraq or Afghanistan war logs, which were also violations of the TOS, six months ago. Besides which, they have a business interest in a robust first amendment - they sell books, including dangerous and subversive ones containing state secrets! This isn't hard people!

  4. As Tim Bray said, the spinelessness of the IT industry in general is depressing when one considers that we should have "freedom of speech" in our DNA.

  5. The following ideas are not contradictory:
  • Cablegate is not necessarily a good thing
  • The way in which various governments and their officials have responded to it is an attack on the freedom of information and on the free press.
  • Julian Assange can be (a) a scumbag rapist, (b) justifiably paranoid, (c) a raging egotist and (d) doing really important work, all at the same time.
Good thinkers on the subject include Clay Shirky and Glenn Greenwald.

2010/11/25

'War Is a Force That Gives Us Meaning' by Chris Hedges

This book is about war, but not about guns or tanks or tactics or strategy. It laments the nature of war, which is murder. It describes the effect of war, the moral and physical destruction and degradation of the victims caught up in it - the soldiers, civilians, and witnesses such as the author. It is an eloquent, upsetting and disturbing book. Although it seems to have been written out of a deep sense of despair with humanity, it does in the final chapter sound a solitary note of hope - in our capacity for love, and for love to triumph over hate.

2010/11/17

Mobile telcos need to evolve.

It's commonly understood that mobile network operators are afraid being turned into dumb pipes. I think that they should reject this fear and embrace their fate, and furthermore, that the first network to do this would get the jump on the competition and clean up. How?

Firstly, networks should become completely device-agnostic. Phones and their financing should be separated from the networks. Long-term contracts and the financing deals on phones basically function as an installment scheme for the phone itself, so unbundle this from the contract. The retail outlets could then sell unlocked phones for any network. In some countries (e.g. USA) this may cause headaches for your competitors, but that's partly the point.

Secondly, charge customers for what they use. Charge per kilobyte. Make as little distinction as possible between voice, GPRS, SMS, or MMS traffic. Pass on charges from traffic to other mobile operators directly, and itemize the charges separately on the statements.

In doing this, the operator transforms into a bulk supplier of bandwidth as a commodity.

An operator could do the sums as follows: the operating costs of the network, plus payments for infrastructure, plus needed funds for planned expansion, gives a monthly figure.

My completely uninformed bet is that this figure, divided by the total kilobytes per user per month, plus a 50% margin would still be much, much lower than the current call rates for most mobile users. By pulling the floor out of the mobile market, customers would migrate in droves.

Doing this would also trigger the mother of all price wars. However, as the first to adopt the model, the operator gets the jump on the competition and should, adequately managed, remain a couple of iterations ahead of the rest.

The benefit in doing this is not just to the company that grabs market share, but to society in general. Mobile devices no longer need to be phones but could also be AR, telemetry, or whatever. Device manufacturers and developers should be able to roll out new products and services as easily as they can on the Internet, and the mobile network would become the infrastructural foundation for the next evolutionary step in the information revolution, a leap in the level of interconnectedness, and the fabled "Internet of things". But first they need to get out of the way of their own business model.

2010/06/05

On synthetic genomics aka 'artificial life', and dire predictions.

By now most people have read about 'Artificial Life', i.e. the transplant of a DNA sequence manufactured based on a digital blueprint into a yeast cell, which then reproduces as a yeast cell was, therefore qualifying as a life-form whose DNA was essentially programmed into it in its entirety by humans. The DNA was that of a different breed of yeast cell, plus markers. The paper, 'Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome[PDF]', was published in the journal Science.

Several interesting links on the subject: firstly, this: 'Age of Excessions Interlude: Biology, or the Drugs Win the Drug War', offers a useful perspective on how to view the advance, and its potential (non-legal) uses. Personally I think that if this tech becomes user-friendly enough to end the War on Some Drugs, many many other life-changing things will result besides that may make the WoSD and its fate a foot-note in our long-term history, assuming we get that far.

You may wonder at whether drug addicts would successfully replicate procedures created at great expense in a dedicated institute funded to the tune of millions. Well, this post on garage biotech in Silicon Valley is an interesting read for its insight into the possibilities of small-scale laboratories. In addition to this, it's worth pointing out that one of the biggest challenges faced by the Venter Institute was creating complete and error-free strands of DNA. Short sequences of DNA can be mail-ordered today. For example, in 2002 scientists mail-ordered the DNA (transcoded RNA) of the poliovirus. Back then it took two years to assemble approximately 8kb of RNA. However, prices for sequencing and synthesizing DNA have dropped exponentially, following a trend similar to Moore's law.

Also worth bringing up again in this context is 'Why the future doesn't need us', a seminal essay by Bill Joy, formerly of Sun Microsystems, examining whether or not humanity's unstoppable and exponentially increasing empowerment of individuals with information will eventually lead to catastrophe.

(Ironically, if his insights are correct, and 'info-weapons' some day enable WMD on a scale heretofore unseen, then I speculate that working 'DRM' applied to such weaponisable information could form part of a system that would stand between us and disaster.)

I was discussing this last Sunday with Bethan when I was reminded of something Brad Hicks wrote on the similar subject of economic bubbles:
The single most reliable way to predict a bubble is when the business press, passing along what mainstream economists are telling them, say that the reason you can believe that we're not in a bubble is that "new fundamentals are emerging," or in other words, "the old rules don't apply any more because (fill in the blank)."
He has a point. Likewise, if you ask someone why humanity should be considered more responsible now than (say) seventy years ago, the answer that "things are different now" is a sign that we've learnt nothing. Human nature does have some depressing constants just as it has uplifting ones, and misplaced confidence and lack of humility are some of the former.

The story of technology and hubris is one that can be found engrained in western culture. Fired brick is one of humanitys earliest technological advances, and had a transformative impact on society: moving away from stone as a construction material freed early cities from dependency on local quarries and permitted a higher rate of settlement expansion.

With this in mind, the other thing I heard three weeks ago was a talk on the story of the Tower of Babel. Make of this what you will.

2010/05/19

Jevons paradox, Moore's law and utility computing

TL;DR: A quirk of economics may explain why computers have always seemed too slow, and could indicate that a utility computing boom will take place in the near future.

First: Jevons paradox...

...is defined as follows on Wikipedia:
The proposition that technological progress that increases the efficiency with which a resource is used, tends to increase (rather than decrease) the rate of consumption of that resource.
This was first observed by Jevons himself with coal:
Watt's innovations made coal a more cost effective power source, leading to the increased use of the steam engine in a wide range of industries. This in turn increased total coal consumption, even as the amount of coal required for any particular application fell.
I think that we're experiencing this effect with CPU time and storage. Specifically, if we recast Moore's law in terms of increased efficiency of CPU instruction processing per dollar, then Jevons paradox explains why software generally never seems to get any faster, and why we always seem to be running out of storage space.

This is backed up by anecdotal evidence and folk knowledge. Consider generally accepted adages such as Wirth's law, "Software is getting slower more rapidly than hardware becomes faster", or the variations of Parkinsons law, such as "Data expands to fill the space available for storage", and Nathan Myhrvold's 'first law of software', to paraphrase: "Software is a gas: it always expands to fit whatever container it is stored in."

To provide more empirical proof, one would have to be able to take a survey of data like "historical average time taken by a user to perform common operation X", or "average % disk space free since 1990 on home/office PCs". I'd be interested in any links to studies similar to this, if anyone has any examples.

Second: the commoditisation of CPU time and storage

Generally, clients do not seem to be getting 'thicker', or pushing boundaries in terms of CPU power and storage. Mean-while, many companies are spending a lot of money on new datacenters, and the concept of "Big Data" is gaining ground. Server-side, we've moved to multi-core and we're learning more about running large clusters of cheap hardware.

Right now we're at the inflection point in a movement towards large datacenters that use economies of scale and good engineering to attain high efficiencies. When this infrastructure is exposed at a low level, its use is metered in ways similar to other utilities such as electricity and water. EC2 is the canonical example.

I believe that CPU time and storage will eventually become a true commodity, traded on open markets, like coal, oil, or in some regions of the world, electricity. The barriers to this happening today are many: lack of standard APIs and bandwidth- or network-related lock-in are two worth mentioning. However, you can see foreshadowing of this in Amazon's spot instances feature, which uses a model close to how real commodities are priced. (Aside: Jonathan Schwartz was posting about this back in 2005.)

In an open market, building a 'compute power station' becomes an investment whose return on capital would be linked to the price of CPU time & storage in that market. The laws of supply and demand would govern this price as it does any other. For example, one can imagine that just as CO2 emissions dip during an economic downturn, CPU time would be also cheaper as less work is done.

In addition to this, if Moore's law continues to hold, newer facilities would host ever-faster, ever-more-efficient hardware. This would normally push the price of CPU time inexorably downwards, and make investing in any datacenter a bad idea. As a counter-point to this, we can see that todays hosting market is relatively liquid on a smaller scale, and people still build normal datacenters. Applying Jevons paradox, however, goes further, indicating that as efficiency increases, demand will also increase. Software expands to fill the space available.

Third: looking back

I think a closer look at recent history will help to shed light on the coming market in utility computing. Two subjects in particular might be useful to study.

Coal was, in Britain, a major fuel of the last industrial revolution. From 1700 to 1960, production volume increased exponentially [PDF, page 34] and the 'consumer' price fell in real terms by 40%, mainly due to decreased taxes and transportation costs. At the same time, however, production prices rose by 20%. In his book The Coal Question, Jevons posed a question about coal that we may find familiar: for how long can supply continue to increase exponentially?

However, the parallels only go so far. Coal mining technology only progressed iteratively, with nothing like Moore's law behind it - coal production did peak eventually. The mines were controlled by a cartel, the "Grand Allies", who kept production prices relatively stable by limiting over-production. Today we have anti-trust laws to prevent that from happening.

Lastly, the cost structure of the market was different: production costs were never more than 50% of the consumer price, whereas the only cost between the producer and the consumer of CPU time is bandwidth. Bandwidth is getting cheaper all the time, although maybe not at a rate sufficient for it to remain practically negligible throughout an exponential increase in demand for utility computing.

Electricity, as Jon Schwartz noted in the blog posts linked to above, and as Michael Manos noted here, started out as the product of small, specialised plants and generators in the basements of large buildings, before transforming rapidly into the current grid system, complete with spot prices etc. Giants like GE were created in the process.

As with electricity, it makes sense to use CPU power as close to you as possible. For electricity there are engineering limits concerning long distance power transmission; on networks, latency increases the further away you go. There are additional human constraints. Many businesses prefer to deal with suppliers in their own jurisdictions for tax reasons and for easier access to legal recourse. Plenty of valuable data may not leave its country of origin. For example, a medical institution may not be permitted to transfer its patient data abroad.

For me, this indicates that there would be room at a national level for growth in utility computing. Regional players may spring up, differentiating themselves simply through their jurisdiction or proximity to population centers. To enable rapid global build-out, players could set up franchise operations, albeit with startup costs and knowledge-bases a world away from the typical retail applications of the business model.

Just as with the datacenter construction business, building out the fledgling electricity grid was capital intensive. Thomas Edison's company was allied with the richest and most powerful financier of the time, J.P. Morgan, and grew to become General Electric. In contrast, George Westinghouse, who built his electricity company on credit and arguably managed it better, didn't have Wall St. on his side and so lost control of his company in an economic crisis.

Finally: the questions this leaves us with

It's interesting to note that two of the companies that are currently ahead in utility computing - Google and Microsoft - have sizable reserves measured in billions. With that kind of cash, external capital isn't necessary as it was to GE and Westinghouse. But neither of them, nor any other player, seem to be building datacenters at a rate comparable to that of power station construction during the electrifying of America. Are their current rates going to take off in future? If so, how will they finance it?

Should investors pour big money into building utility datacenters? How entrenched is the traditional hosting business, and will someone eat their lunch by diving into this? Clay Shirky compared a normal web-hosting environment to one run by AT&T - will a traditional hosting business have the same reaction as AT&T did to him?

A related question is the size of the first-mover advantage - are Google, Amazon, Microsoft and Rackspace certain to dominate this business? I think this depends on how much lock-in they can create. Will the market start demanding standard APIs and fighting against lock-in, and if so, when? Looking at the adoption curves of other technologies, like the Web, should help to answer this question. Right now the de-facto standard APIs such as those of EC2 and Rackspace can be easily cloned, but will this change?

I'm throwing these questions out here because I don't have the answers. But the biggest conclusion that I think can be tentatively drawn from applying Jevons paradox to the 'resources' of CPU time and storage space, is that in the coming era of utility computing, there may soon be a business case to be made for building high-efficiency datacenters and exposing them to the world at spot prices from day one.

2010/04/18

YAGNI and the boring stuff

YAGNI is an acyonym meaning "You Ain't Gonna Need It", and behind this lies the principle in software that code should not be written before it is needed: the temptation to prematurely generalise and abstract, to write a framework for the problem you're trying to solve, should be resisted.

It also focuses you on the task at hand, which is to directly solve the problem. By focusing on the shortest path between point A and point B, the result should be small, lean, and direct.

However, in several projects I've worked on there comes a point where certain aspects of a codebase not directly related to its functionality begin to demand attention. These issues can be ignored up until the point at which inaction damages the codebase as a whole. I find these things are common across projects, and so they could be classed as "You're Gonna Need It". The examples that immediately come to mind in Java projects are:

  • A logging setup, i.e. configuration of log files for errors and debug info.
  • Likewise, a sprinkling of informative logging statements, without which it becomes difficult to tell what went wrong outside of a debugger.
  • An exception hierarchy. You can throw new RuntimeException for a while, but eventually it becomes unmanageable and you need to distinguish in catch() clauses between errors inside and outside of your own codebase
  • Interfaces, for the classic reason of hiding the implementation, but also because it enables Proxy-based AOP which is useful for a bunch of stuff 'YAGN' but may one day be 'YGNI', such as benchmarking and security interceptors.
  • A base test case / test harness that lets you pull in different parts of the application and test them individually.

This stuff is all boring but necessary. When living without them starts to affect efficiency and quality, that is the point at which to stop working on functionality, get this stuff right, and then go back to working on the important things.

2010/04/16

And now for messed up UK politics: The Digital Economy Act and other IP

I want to bring together some articles I've read recently, which some of you may be familar with.

In Why Content is a Public Good, an economist-turned-coder explains why at an economic level, charging for digital content works against the nature of the digital medium. Note that this doesn't mean that it's right or wrong to do so - morality doesn't enter into it. It's just the nature of the beast. Markets are not intrinsically moral in any sense of the term, they're more akin to natural phenomena.

So it is with the nature of digital information and its redistributability. Disintermediation is what's slowly destroying the business models of the music, movie, book, and software industries. And as for "information wants to be free", the above post links to two good posts on the subject, one by Charlie Stross and one by Cory Doctorow, which explains the sloganity of the phrase.

In the Hacker News discussion of the this article, this comment by fexl proposes a "recursive auction" method of distributing digital goods that seems interesting. Googling on the keywords he provides, along with the examples provided in the article itself, shows that new and interesting ways of selling information can be had. Suppose Wikileaks auctioned of it's scoops like this? How much would CNN pay to have the first copy of the video and it's associated materials? How luch would the New York Times pay to have the second?

The second post on "information wants to be free" above by Cory Doctorow, a UK-resident Canadian. I like the guy for his eloquence in defending digital civil liberties and his understanding of online culture. He's also an author, and he credibly claims to make more money because he publishes his books online for free in addition to hard-copy. He wrote this article in the Guardian about the Digital Economy Act, and all the various (bad) things it contains.

There may be a little hyperbole in there but the essence of his article is correct. The bill was shoved through parliament with insufficient debate and is exceedingly sucky. Sometimes it seems to me that keeping a half-open eye on any leglislative process, be it in the the UK (the Digital Economy Act), USA (HCR), or the EU (software patents), is not just like being in a sausage factory...
“Laws, like sausages, cease to inspire respect in proportion as we know how they are made."
... but more like being in a medieval abattoir. But I digress.

The third article concerns patents, and discusses a recent study that suggests that the entire system of patents is counter-productive to inducing or encouraging invention, and to 'social welfare' in general. Not just software or genetic patents, but all patents.

The article is a good summary of the paper, which is available for free. In short, most inventors (a) create in order to scratch an itch for themselves, not for money, (b) build on previous work which is easier to do if it isn't patented, and (c) prefer to freely share ideas with other inventors, rather be forced than keep things secret so they can make money.

Now I'm just going to throw some ideas out here. I like the whole open source mentality, and I don't like share-cropping, and although I have an iPhone I'm not interested in developing for a closed, restrictive-if-not-oppresive environment. It's ironic that I left the world of Microsoft to get away from that, when Microsoft is the company that won domination of the desktop market by being more open than Apple. Hopefully, now Google will do the same.

However, one thing worries me about this: when Microsoft and Apple went to war, Microsoft was swiftly gaining several 'legs' to its stool (or eyes to its group, if you play go at all). Google has one: search. Kill Google's search revenue and everything Google does - mobile, maps, Chrome, CromeOS, etc. - suddenly goes *poof* and disappears. Just as if the American federal government is a large insurance company with an army, then Google is an advertising agency with a well-run IT department that's allowed to get creative with cute little side-projects.

But I'll leave you with this final question, which is the same one that I would have posed to RMS when he gave his Fostem '05 talk, which is this: faced with all of the above, how does one effectively turn on, tune in and drop out - i.e. combine elements of civil disobedience and counter-cultural openness to combat the seemingly inevitable shift to Orwellian 'idea management'? I have no idea, but one thing I know is that it's not by downloading as many MP3's as possible. I'll leave you with a link to one of my own posts - "open source music", and the conclusion from Charlie Stross's article:

"So it follows that if you want information to be free you are taking on an obligation to make information, and give it freedom. An obligation to work to better the lot of humanity, not to merely sponge off the labour of others.

Next time you hear someone invoke "information wants to be free" as a justification for demanding free-as-in-no-payment-expected content, ask them: precisely what content have you released for free lately?"

The timing could not be better. Or worse, depending on your POV.

So, GOP Senate minority leader Mitch McConnell flies into New York to coordinate messaging with Wall Street in opposition to the proposed financial regulatory reform intended bring an end to the "Too Big To Fail" era, comes back with misleading and dishonest arguments that pretend to be against helping TBTF banks while actually supporting them. And yes, it is that obvious.

Then, just as the incredulity and blow-back starts to mount, the SEC charges Goldman Sachs, the great vampire squid itself, with CDO- and RMBS-related fraud. Ouch! So now he's stuck in a corner with the banks, the biggest of which is being hauled into court on fraud charges. The Dems will suddenly find it much easier to paint the GOP as being in bed with the banks, and this puts heavy, populist pressure on the GOP senators to break ranks and therefore the fillibuster, allowing the leglislation to go through. I'm a pessimist on this, however: I think Mitch can still rely on the cast-iron party discipline the GOP has built up to stop the bill going forwards.

2010/02/15

The advantages of programming on a small netbook.

Most software developers prefer to work on "fast" computers, and I am normally one of them. Many of my tools work best with lots of memory, and some common tasks become pleasantly instantaneous on a nice, powerful computer.

At work, for example, our teams project is 250-500,000 lines of code modularised into many sub-projects. Writing an extra five thousand lines of code would be considered a "minor" change. Working with this project on a slow computer would be a nightmare.

However, my tiny little netbook, a rather un-sexy and previous-generation EeePC 901SD, is now my main development machine for my free-time-only side-project, which is also written in Java and uses the technologies documented in my previous blog posts. The software is not a toy: it's written to the same quality standards as my work project, if not higher; with no time constraints or deadlines, I can be as perfectionist and nit-picky as I want. Before I started this project, my previous project made heavy use of PostgreSQL, and I wrote hundreds of lines of C, again on a similar netbook.

The surprise? It's been a great experience. Being a very fast typist, the keyboard initially seemed cramped, but quickly became comfortable. I use a lightweight IDE (Netbeans) with a reduced font size, and screen real-estate is not an issue. I use Linux (Ubuntu), which means I have a decent command prompt and unix-like system at my finger-tips.

More importantly, my coding habits have changed. I no longer waste time thinking in front of the computer screen. I design the code in my head while walking to work in the morning, shopping for groceries, or doing the dishes. If I had a screen in front of me, the temptation would be to do something, to try out the idea, instead of reflecting on what really needs to be done, or how to do it.

I can code in a crowded bar, or in 20 minutes on a tram surrounded by commuters, because when I'm sitting in front of a keyboard, I already know what I'm going to do. The rest is pure typing.

The result: I would guess that I've done no more than 30 man-days of development. Less than half of it with an Internet connection. The time spent has been spread over the past four months in chunks of no more than four hours, and mostly shorter than one. But it has all been valuable learning or steady, measurable progress.

This requires a lot more patience than usual from me. Time for my side-project comes at its own pace and cannot be hurried, and I'm not intending to sacrifice anything to make faster progress.

I also learnt this: when I demand to work with only the very best tools in every situation, this is misplaced pride, maybe arrogance. It limits myself and chokes faith in the abilities I've been blessed with. We sometimes write on small pieces of paper, so why can't we sometimes program on small computers?

2010/02/06

EC2 and CXF: Serialising objects in JAX-WS

The second problem I had with CXF (or, more correctly, JAXB) was in trying to serialise JAXB objects such as CreateVolumeType into XML using a copy of JAXBuddy from Typica.

This failed with the error message: "Unable to marshal type XXX as an element because it is missing an @XmlRootElement annotation". Searching for this error message led me to this blog post by Kohsuke Kawaguchi, and so I copied the sample configuration from the comment into my binding file. This didn't work, and the error message remained the same - the configuration didn't take effect.

Googling again I found this CXF bug on serialisation and "simple" mode. The information in this issue, combined with the information in the wsdl-to-java documentation gave me the information I needed to correct the binding file.

As a humorous side-effect this also changed a lot of class names and broke a lot of code, but the new class-names are an improvement, so there we go.

My updated sample binding file is on pastebin here. Also, my original post about CXF is messed up in some way and I'll get round to fixing it soon.

2010/02/04

EC2 and CXF / JAX-WS: Configuring WSDL endpoints & service URL

So, it's been a couple of weeks in real-time and about ten hours in "code"-time since I last posted on EC2 & JAX-WS, since which I've discovered two extra things. The second is in the next blog post.

Firstly, Amazon SQS requires that you make SOAP 1.1 calls to a unique URl per queue. This means that to use a given queue, a specific port has to be configured with the new location in JAX-WS. This took a little digging, but it is clearly mentioned in the CXF documentation. The relevant code is this:

String queueURL = "http://sqs.amazonaws.com/blablabla";

MessageQueue q = new MessageQueue(); //can re-use this apparently

MessageQueuePortType p = q.getMessageQueueHttpsPort(); //this is for a specific queue

//initialise the port to use WS-Security as documented below

BindingProvider provider = (BindingProvider)p;
provider.getRequestContext().put(BindingProvider.ENDPOINT_ADDRESS_PROPERTY, queueURL);

...and that's it.

2010/01/28

EC2 Spot instances, latency

There's been an interesting conversation sputtering along about internal network latency on EC2 and its possible relation to the newly-introduced spot instances feature of EC2. Spot instances represent the next step towards a truly commoditised computing platform, which in turn will hopefully be a key part of our future societal information infrastructure (a.k.a. infostructure).

But back to the nuts and bolts:

Jan 12 2010:
Jan 13 2010, Alan Williamson again: "Amazon EC2 Latency: The Pretty Graphs".

Then on Jan 15
2010: seldo.com wrote "Are spot instances killing the performance of Amazon EC2?", and made the first connection with spot instances.

At this point, it would be useful and informative to take the EC2 spot price history from one of these sites (cloudexchange.org seems the best), and correlate it with the network latency experienced by CloudKick and Alan Williamson. I've asked for the figures behind the graphs - we shall see!