Easy Questions Are Boring

2010/11/25

'War Is a Force That Gives Us Meaning' by Chris Hedges

This book is about war, but not about guns or tanks or tactics or strategy. It laments the nature of war, which is murder. It describes the effect of war, the moral and physical destruction and degradation of the victims caught up in it - the soldiers, civilians, and witnesses such as the author. It is an eloquent, upsetting and disturbing book. Although it seems to have been written out of a deep sense of despair with humanity, it does in the final chapter sound a solitary note of hope - in our capacity for love, and for love to triumph over hate.

2010/11/17

Mobile telcos need to evolve.

It's commonly understood that mobile network operators are afraid being turned into dumb pipes. I think that they should reject this fear and embrace their fate, and furthermore, that the first network to do this would get the jump on the competition and clean up. How?

Firstly, networks should become completely device-agnostic. Phones and their financing should be separated from the networks. Long-term contracts and the financing deals on phones basically function as an installment scheme for the phone itself, so unbundle this from the contract. The retail outlets could then sell unlocked phones for any network. In some countries (e.g. USA) this may cause headaches for your competitors, but that's partly the point.

Secondly, charge customers for what they use. Charge per kilobyte. Make as little distinction as possible between voice, GPRS, SMS, or MMS traffic. Pass on charges from traffic to other mobile operators directly, and itemize the charges separately on the statements.

In doing this, the operator transforms into a bulk supplier of bandwidth as a commodity.

An operator could do the sums as follows: the operating costs of the network, plus payments for infrastructure, plus needed funds for planned expansion, gives a monthly figure.

My completely uninformed bet is that this figure, divided by the total kilobytes per user per month, plus a 50% margin would still be much, much lower than the current call rates for most mobile users. By pulling the floor out of the mobile market, customers would migrate in droves.

Doing this would also trigger the mother of all price wars. However, as the first to adopt the model, the operator gets the jump on the competition and should, adequately managed, remain a couple of iterations ahead of the rest.

The benefit in doing this is not just to the company that grabs market share, but to society in general. Mobile devices no longer need to be phones but could also be AR, telemetry, or whatever. Device manufacturers and developers should be able to roll out new products and services as easily as they can on the Internet, and the mobile network would become the infrastructural foundation for the next evolutionary step in the information revolution, a leap in the level of interconnectedness, and the fabled "Internet of things". But first they need to get out of the way of their own business model.

2010/06/05

On synthetic genomics aka 'artificial life', and dire predictions.

By now most people have read about 'Artificial Life', i.e. the transplant of a DNA sequence manufactured based on a digital blueprint into a yeast cell, which then reproduces as a yeast cell was, therefore qualifying as a life-form whose DNA was essentially programmed into it in its entirety by humans. The DNA was that of a different breed of yeast cell, plus markers. The paper, 'Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome[PDF]', was published in the journal Science.

Several interesting links on the subject: firstly, this: 'Age of Excessions Interlude: Biology, or the Drugs Win the Drug War', offers a useful perspective on how to view the advance, and its potential (non-legal) uses. Personally I think that if this tech becomes user-friendly enough to end the War on Some Drugs, many many other life-changing things will result besides that may make the WoSD and its fate a foot-note in our long-term history, assuming we get that far.

You may wonder at whether drug addicts would successfully replicate procedures created at great expense in a dedicated institute funded to the tune of millions. Well, this post on garage biotech in Silicon Valley is an interesting read for its insight into the possibilities of small-scale laboratories. In addition to this, it's worth pointing out that one of the biggest challenges faced by the Venter Institute was creating complete and error-free strands of DNA. Short sequences of DNA can be mail-ordered today. For example, in 2002 scientists mail-ordered the DNA (transcoded RNA) of the poliovirus. Back then it took two years to assemble approximately 8kb of RNA. However, prices for sequencing and synthesizing DNA have dropped exponentially, following a trend similar to Moore's law.

Also worth bringing up again in this context is 'Why the future doesn't need us', a seminal essay by Bill Joy, formerly of Sun Microsystems, examining whether or not humanity's unstoppable and exponentially increasing empowerment of individuals with information will eventually lead to catastrophe.

(Ironically, if his insights are correct, and 'info-weapons' some day enable WMD on a scale heretofore unseen, then I speculate that working 'DRM' applied to such weaponisable information could form part of a system that would stand between us and disaster.)

I was discussing this last Sunday with Bethan when I was reminded of something Brad Hicks wrote on the similar subject of economic bubbles:

The single most reliable way to predict a bubble is when the business press, passing along what mainstream economists are telling them, say that the reason you can believe that we're not in a bubble is that "new fundamentals are emerging," or in other words, "the old rules don't apply any more because (fill in the blank)."

He has a point. Likewise, if you ask someone why humanity should be considered more responsible now than (say) seventy years ago, the answer that "things are different now" is a sign that we've learnt nothing. Human nature does have some depressing constants just as it has uplifting ones, and misplaced confidence and lack of humility are some of the former.

The story of technology and hubris is one that can be found engrained in western culture. Fired brick is one of humanitys earliest technological advances, and had a transformative impact on society: moving away from stone as a construction material freed early cities from dependency on local quarries and permitted a higher rate of settlement expansion.

With this in mind, the other thing I heard three weeks ago was a talk on the story of the Tower of Babel. Make of this what you will.

2010/05/19

Jevons paradox, Moore's law and utility computing

TL;DR: A quirk of economics may explain why computers have always seemed too slow, and could indicate that a utility computing boom will take place in the near future.

First: Jevons paradox...

...is defined as follows on Wikipedia:

The proposition that technological progress that increases the efficiency with which a resource is used, tends to increase (rather than decrease) the rate of consumption of that resource.

This was first observed by Jevons himself with coal:

Watt's innovations made coal a more cost effective power source, leading to the increased use of the steam engine in a wide range of industries. This in turn increased total coal consumption, even as the amount of coal required for any particular application fell.

I think that we're experiencing this effect with CPU time and storage. Specifically, if we recast Moore's law in terms of increased efficiency of CPU instruction processing per dollar, then Jevons paradox explains why software generally never seems to get any faster, and why we always seem to be running out of storage space.

This is backed up by anecdotal evidence and folk knowledge. Consider generally accepted adages such as Wirth's law, "Software is getting slower more rapidly than hardware becomes faster", or the variations of Parkinsons law, such as "Data expands to fill the space available for storage", and Nathan Myhrvold's 'first law of software', to paraphrase: "Software is a gas: it always expands to fit whatever container it is stored in."

To provide more empirical proof, one would have to be able to take a survey of data like "historical average time taken by a user to perform common operation X", or "average % disk space free since 1990 on home/office PCs". I'd be interested in any links to studies similar to this, if anyone has any examples.

Second: the commoditisation of CPU time and storage

Generally, clients do not seem to be getting 'thicker', or pushing boundaries in terms of CPU power and storage. Mean-while, many companies are spending a lot of money on new datacenters, and the concept of "Big Data" is gaining ground. Server-side, we've moved to multi-core and we're learning more about running large clusters of cheap hardware.

Right now we're at the inflection point in a movement towards large datacenters that use economies of scale and good engineering to attain high efficiencies. When this infrastructure is exposed at a low level, its use is metered in ways similar to other utilities such as electricity and water. EC2 is the canonical example.

I believe that CPU time and storage will eventually become a true commodity, traded on open markets, like coal, oil, or in some regions of the world, electricity. The barriers to this happening today are many: lack of standard APIs and bandwidth- or network-related lock-in are two worth mentioning. However, you can see foreshadowing of this in Amazon's spot instances feature, which uses a model close to how real commodities are priced. (Aside: Jonathan Schwartz was posting about this back in 2005.)

In an open market, building a 'compute power station' becomes an investment whose return on capital would be linked to the price of CPU time & storage in that market. The laws of supply and demand would govern this price as it does any other. For example, one can imagine that just as CO2 emissions dip during an economic downturn, CPU time would be also cheaper as less work is done.

In addition to this, if Moore's law continues to hold, newer facilities would host ever-faster, ever-more-efficient hardware. This would normally push the price of CPU time inexorably downwards, and make investing in any datacenter a bad idea. As a counter-point to this, we can see that todays hosting market is relatively liquid on a smaller scale, and people still build normal datacenters. Applying Jevons paradox, however, goes further, indicating that as efficiency increases, demand will also increase. Software expands to fill the space available.

Third: looking back

I think a closer look at recent history will help to shed light on the coming market in utility computing. Two subjects in particular might be useful to study.

Coal was, in Britain, a major fuel of the last industrial revolution. From 1700 to 1960, production volume increased exponentially [PDF, page 34] and the 'consumer' price fell in real terms by 40%, mainly due to decreased taxes and transportation costs. At the same time, however, production prices rose by 20%. In his book The Coal Question, Jevons posed a question about coal that we may find familiar: for how long can supply continue to increase exponentially?

However, the parallels only go so far. Coal mining technology only progressed iteratively, with nothing like Moore's law behind it - coal production did peak eventually. The mines were controlled by a cartel, the "Grand Allies", who kept production prices relatively stable by limiting over-production. Today we have anti-trust laws to prevent that from happening.

Lastly, the cost structure of the market was different: production costs were never more than 50% of the consumer price, whereas the only cost between the producer and the consumer of CPU time is bandwidth. Bandwidth is getting cheaper all the time, although maybe not at a rate sufficient for it to remain practically negligible throughout an exponential increase in demand for utility computing.

Electricity, as Jon Schwartz noted in the blog posts linked to above, and as Michael Manos noted here, started out as the product of small, specialised plants and generators in the basements of large buildings, before transforming rapidly into the current grid system, complete with spot prices etc. Giants like GE were created in the process.

As with electricity, it makes sense to use CPU power as close to you as possible. For electricity there are engineering limits concerning long distance power transmission; on networks, latency increases the further away you go. There are additional human constraints. Many businesses prefer to deal with suppliers in their own jurisdictions for tax reasons and for easier access to legal recourse. Plenty of valuable data may not leave its country of origin. For example, a medical institution may not be permitted to transfer its patient data abroad.

For me, this indicates that there would be room at a national level for growth in utility computing. Regional players may spring up, differentiating themselves simply through their jurisdiction or proximity to population centers. To enable rapid global build-out, players could set up franchise operations, albeit with startup costs and knowledge-bases a world away from the typical retail applications of the business model.

Just as with the datacenter construction business, building out the fledgling electricity grid was capital intensive. Thomas Edison's company was allied with the richest and most powerful financier of the time, J.P. Morgan, and grew to become General Electric. In contrast, George Westinghouse, who built his electricity company on credit and arguably managed it better, didn't have Wall St. on his side and so lost control of his company in an economic crisis.

Finally: the questions this leaves us with

It's interesting to note that two of the companies that are currently ahead in utility computing - Google and Microsoft - have sizable reserves measured in billions. With that kind of cash, external capital isn't necessary as it was to GE and Westinghouse. But neither of them, nor any other player, seem to be building datacenters at a rate comparable to that of power station construction during the electrifying of America. Are their current rates going to take off in future? If so, how will they finance it?

Should investors pour big money into building utility datacenters? How entrenched is the traditional hosting business, and will someone eat their lunch by diving into this? Clay Shirky compared a normal web-hosting environment to one run by AT&T - will a traditional hosting business have the same reaction as AT&T did to him?

A related question is the size of the first-mover advantage - are Google, Amazon, Microsoft and Rackspace certain to dominate this business? I think this depends on how much lock-in they can create. Will the market start demanding standard APIs and fighting against lock-in, and if so, when? Looking at the adoption curves of other technologies, like the Web, should help to answer this question. Right now the de-facto standard APIs such as those of EC2 and Rackspace can be easily cloned, but will this change?

I'm throwing these questions out here because I don't have the answers. But the biggest conclusion that I think can be tentatively drawn from applying Jevons paradox to the 'resources' of CPU time and storage space, is that in the coming era of utility computing, there may soon be a business case to be made for building high-efficiency datacenters and exposing them to the world at spot prices from day one.

2010/04/18

YAGNI and the boring stuff

YAGNI is an acyonym meaning "You Ain't Gonna Need It", and behind this lies the principle in software that code should not be written before it is needed: the temptation to prematurely generalise and abstract, to write a framework for the problem you're trying to solve, should be resisted.

It also focuses you on the task at hand, which is to directly solve the problem. By focusing on the shortest path between point A and point B, the result should be small, lean, and direct.

However, in several projects I've worked on there comes a point where certain aspects of a codebase not directly related to its functionality begin to demand attention. These issues can be ignored up until the point at which inaction damages the codebase as a whole. I find these things are common across projects, and so they could be classed as "You're Gonna Need It". The examples that immediately come to mind in Java projects are:

A logging setup, i.e. configuration of log files for errors and debug info.
Likewise, a sprinkling of informative logging statements, without which it becomes difficult to tell what went wrong outside of a debugger.
An exception hierarchy. You can throw new RuntimeException for a while, but eventually it becomes unmanageable and you need to distinguish in catch() clauses between errors inside and outside of your own codebase
Interfaces, for the classic reason of hiding the implementation, but also because it enables Proxy-based AOP which is useful for a bunch of stuff 'YAGN' but may one day be 'YGNI', such as benchmarking and security interceptors.
A base test case / test harness that lets you pull in different parts of the application and test them individually.

This stuff is all boring but necessary. When living without them starts to affect efficiency and quality, that is the point at which to stop working on functionality, get this stuff right, and then go back to working on the important things.

2010/04/16

And now for messed up UK politics: The Digital Economy Act and other IP

I want to bring together some articles I've read recently, which some of you may be familar with.

In Why Content is a Public Good, an economist-turned-coder explains why at an economic level, charging for digital content works against the nature of the digital medium. Note that this doesn't mean that it's right or wrong to do so - morality doesn't enter into it. It's just the nature of the beast. Markets are not intrinsically moral in any sense of the term, they're more akin to natural phenomena.

So it is with the nature of digital information and its redistributability. Disintermediation is what's slowly destroying the business models of the music, movie, book, and software industries. And as for "information wants to be free", the above post links to two good posts on the subject, one by Charlie Stross and one by Cory Doctorow, which explains the sloganity of the phrase.

In the Hacker News discussion of the this article, this comment by fexl proposes a "recursive auction" method of distributing digital goods that seems interesting. Googling on the keywords he provides, along with the examples provided in the article itself, shows that new and interesting ways of selling information can be had. Suppose Wikileaks auctioned of it's scoops like this? How much would CNN pay to have the first copy of the video and it's associated materials? How luch would the New York Times pay to have the second?

The second post on "information wants to be free" above by Cory Doctorow, a UK-resident Canadian. I like the guy for his eloquence in defending digital civil liberties and his understanding of online culture. He's also an author, and he credibly claims to make more money because he publishes his books online for free in addition to hard-copy. He wrote this article in the Guardian about the Digital Economy Act, and all the various (bad) things it contains.

There may be a little hyperbole in there but the essence of his article is correct. The bill was shoved through parliament with insufficient debate and is exceedingly sucky. Sometimes it seems to me that keeping a half-open eye on any leglislative process, be it in the the UK (the Digital Economy Act), USA (HCR), or the EU (software patents), is not just like being in a sausage factory...

“Laws, like sausages, cease to inspire respect in proportion as we know how they are made."

... but more like being in a medieval abattoir. But I digress.

The third article concerns patents, and discusses a recent study that suggests that the entire system of patents is counter-productive to inducing or encouraging invention, and to 'social welfare' in general. Not just software or genetic patents, but all patents.

The article is a good summary of the paper, which is available for free. In short, most inventors (a) create in order to scratch an itch for themselves, not for money, (b) build on previous work which is easier to do if it isn't patented, and (c) prefer to freely share ideas with other inventors, rather be forced than keep things secret so they can make money.

Now I'm just going to throw some ideas out here. I like the whole open source mentality, and I don't like share-cropping, and although I have an iPhone I'm not interested in developing for a closed, restrictive-if-not-oppresive environment. It's ironic that I left the world of Microsoft to get away from that, when Microsoft is the company that won domination of the desktop market by being more open than Apple. Hopefully, now Google will do the same.

However, one thing worries me about this: when Microsoft and Apple went to war, Microsoft was swiftly gaining several 'legs' to its stool (or eyes to its group, if you play go at all). Google has one: search. Kill Google's search revenue and everything Google does - mobile, maps, Chrome, CromeOS, etc. - suddenly goes *poof* and disappears. Just as if the American federal government is a large insurance company with an army, then Google is an advertising agency with a well-run IT department that's allowed to get creative with cute little side-projects.

But I'll leave you with this final question, which is the same one that I would have posed to RMS when he gave his Fostem '05 talk, which is this: faced with all of the above, how does one effectively turn on, tune in and drop out - i.e. combine elements of civil disobedience and counter-cultural openness to combat the seemingly inevitable shift to Orwellian 'idea management'? I have no idea, but one thing I know is that it's not by downloading as many MP3's as possible. I'll leave you with a link to one of my own posts - "open source music", and the conclusion from Charlie Stross's article:

"So it follows that if you want information to be free you are taking on an obligation to make information, and give it freedom. An obligation to work to better the lot of humanity, not to merely sponge off the labour of others.

Next time you hear someone invoke "information wants to be free" as a justification for demanding free-as-in-no-payment-expected content, ask them: precisely what content have you released for free lately?"

The timing could not be better. Or worse, depending on your POV.

So, GOP Senate minority leader Mitch McConnell flies into New York to coordinate messaging with Wall Street in opposition to the proposed financial regulatory reform intended bring an end to the "Too Big To Fail" era, comes back with misleading and dishonest arguments that pretend to be against helping TBTF banks while actually supporting them. And yes, it is that obvious.

Then, just as the incredulity and blow-back starts to mount, the SEC charges Goldman Sachs, the great vampire squid itself, with CDO- and RMBS-related fraud. Ouch! So now he's stuck in a corner with the banks, the biggest of which is being hauled into court on fraud charges. The Dems will suddenly find it much easier to paint the GOP as being in bed with the banks, and this puts heavy, populist pressure on the GOP senators to break ranks and therefore the fillibuster, allowing the leglislation to go through. I'm a pessimist on this, however: I think Mitch can still rely on the cast-iron party discipline the GOP has built up to stop the bill going forwards.

2010/02/15

The advantages of programming on a small netbook.

Most software developers prefer to work on "fast" computers, and I am normally one of them. Many of my tools work best with lots of memory, and some common tasks become pleasantly instantaneous on a nice, powerful computer.

At work, for example, our teams project is 250-500,000 lines of code modularised into many sub-projects. Writing an extra five thousand lines of code would be considered a "minor" change. Working with this project on a slow computer would be a nightmare.

However, my tiny little netbook, a rather un-sexy and previous-generation EeePC 901SD, is now my main development machine for my free-time-only side-project, which is also written in Java and uses the technologies documented in my previous blog posts. The software is not a toy: it's written to the same quality standards as my work project, if not higher; with no time constraints or deadlines, I can be as perfectionist and nit-picky as I want. Before I started this project, my previous project made heavy use of PostgreSQL, and I wrote hundreds of lines of C, again on a similar netbook.

The surprise? It's been a great experience. Being a very fast typist, the keyboard initially seemed cramped, but quickly became comfortable. I use a lightweight IDE (Netbeans) with a reduced font size, and screen real-estate is not an issue. I use Linux (Ubuntu), which means I have a decent command prompt and unix-like system at my finger-tips.

More importantly, my coding habits have changed. I no longer waste time thinking in front of the computer screen. I design the code in my head while walking to work in the morning, shopping for groceries, or doing the dishes. If I had a screen in front of me, the temptation would be to do something, to try out the idea, instead of reflecting on what really needs to be done, or how to do it.

I can code in a crowded bar, or in 20 minutes on a tram surrounded by commuters, because when I'm sitting in front of a keyboard, I already know what I'm going to do. The rest is pure typing.

The result: I would guess that I've done no more than 30 man-days of development. Less than half of it with an Internet connection. The time spent has been spread over the past four months in chunks of no more than four hours, and mostly shorter than one. But it has all been valuable learning or steady, measurable progress.

This requires a lot more patience than usual from me. Time for my side-project comes at its own pace and cannot be hurried, and I'm not intending to sacrifice anything to make faster progress.

I also learnt this: when I demand to work with only the very best tools in every situation, this is misplaced pride, maybe arrogance. It limits myself and chokes faith in the abilities I've been blessed with. We sometimes write on small pieces of paper, so why can't we sometimes program on small computers?

2010/02/06

EC2 and CXF: Serialising objects in JAX-WS

The second problem I had with CXF (or, more correctly, JAXB) was in trying to serialise JAXB objects such as CreateVolumeType into XML using a copy of JAXBuddy from Typica.

This failed with the error message: "Unable to marshal type XXX as an element because it is missing an @XmlRootElement annotation". Searching for this error message led me to this blog post by Kohsuke Kawaguchi, and so I copied the sample configuration from the comment into my binding file. This didn't work, and the error message remained the same - the configuration didn't take effect.

Googling again I found this CXF bug on serialisation and "simple" mode. The information in this issue, combined with the information in the wsdl-to-java documentation gave me the information I needed to correct the binding file.

As a humorous side-effect this also changed a lot of class names and broke a lot of code, but the new class-names are an improvement, so there we go.

My updated sample binding file is on pastebin here. Also, my original post about CXF is messed up in some way and I'll get round to fixing it soon.

2010/02/04

EC2 and CXF / JAX-WS: Configuring WSDL endpoints & service URL

So, it's been a couple of weeks in real-time and about ten hours in "code"-time since I last posted on EC2 & JAX-WS, since which I've discovered two extra things. The second is in the next blog post.

Firstly, Amazon SQS requires that you make SOAP 1.1 calls to a unique URl per queue. This means that to use a given queue, a specific port has to be configured with the new location in JAX-WS. This took a little digging, but it is clearly mentioned in the CXF documentation. The relevant code is this:

String queueURL = "http://sqs.amazonaws.com/blablabla";

MessageQueue q = new MessageQueue(); //can re-use this apparently

MessageQueuePortType p = q.getMessageQueueHttpsPort(); //this is for a specific queue

//initialise the port to use WS-Security as documented below

BindingProvider provider = (BindingProvider)p;
provider.getRequestContext().put(BindingProvider.ENDPOINT_ADDRESS_PROPERTY, queueURL);

...and that's it.

2010/01/28

EC2 Spot instances, latency

There's been an interesting conversation sputtering along about internal network latency on EC2 and its possible relation to the newly-introduced spot instances feature of EC2. Spot instances represent the next step towards a truly commoditised computing platform, which in turn will hopefully be a key part of our future societal information infrastructure (a.k.a. infostructure).

But back to the nuts and bolts:

Jan 12 2010:

Alan Williamson wrote "Has Amazon EC2 become over subscribed?"
CloudKick: Visual evidence of Amazon EC2 network issues

Jan 13 2010, Alan Williamson again: "Amazon EC2 Latency: The Pretty Graphs".

Then on Jan 15 2010: seldo.com wrote "Are spot instances killing the performance of Amazon EC2?", and made the first connection with spot instances.

At this point, it would be useful and informative to take the EC2 spot price history from one of these sites (cloudexchange.org seems the best), and correlate it with the network latency experienced by CloudKick and Alan Williamson. I've asked for the figures behind the graphs - we shall see!

2009/12/27

Building a WS-Security enabled SOAP client in Maven2 to the EC2 WSDL using JAX-WS / CXF & WSS4J: tips & tricks

Generating a Java client from the Amazon EC2 WSDL that correctly used WS-Security is not completely simple. This blog post from Glen Mazza contains pretty much all the info you need, but as usual there are many things to trip up over along the way. So, without further ado, my contribution.

My setup: I was using Maven2 to construct a JAR file. Running "mvn generate-sources", then, downloads the WSDL and uses it to generate the EC2 object model in src/main/java.

Blogger doesn't like me quoting XML, so I've put my sample POM at pastebin, here. Inside the cxf-codegen-plugin plugin XML you'll see two specific options, "autoNameResolution", which is needed to prevent naming conflicts with the WSDL, and a link to the JXB binding file for JAXWS, which is needed to generate the correct method signatures

Once this is done, then the security credentials need to be configured. There are some pecularities:

As laid out in this tutorial for the Amazon product advertising API, the X.509 certificate and the private key need to be converted into a pkcs12 -format file before they're usable in Java. This is done using OpenSSL:

openssl pkcs12 -export -name amaws -out aws.pkcs12 -in cert-BLABLABLA.pem -inkey pk-BLABLABLA.pem

At this point, I should admit that I spent hours scratching my head because the generated client (see below) gave me the error "java.io.IOException: DER length more than 4 bytes" when trying to read the PKCS12 file. So I switched to the Java Keystore format by using this command (JDK6 format):

keytool -v -importkeystore -srckeystore aws.pkcs12 -srcstoretype pkcs12 -srcalias amaws -srcstorepass password -deststoretype jks -deststorepass password -destkeystore keystore.jks

...and then received the error "java.io.IOException: Invalid keystore format" instead. At this point I googled a bit, and discovered two ways to verify the integrity of keystores, via openSSL and the Java keytool:

#for pkcs12
openssl pkcs12 -in aws.pkcs12 -info

#for keystore
keytool -v -list -storetype jks -keystore keystore.jks

Both the keystore and pkcs12 file were valid. Then, I realised that I'd put the files in src/test/resources which was being put through a filter before landing in "target". The filter was doing something to the files, so of course they couldn't be read properly. Duh me. I put the key material in a dedicated folder with no filtering and this problem was fixed.

My next problem was the exception "java.io.IOException: exception decrypting data - java.security.InvalidKeyException: Illegal key size". This was solved by downloading the "Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files". Simple!

At this point the request was being sent to Amazon! Which then returned a new error message, "Security Header Element is missing the timestamp element". This was because the request didn't have a timestamp. So, I changed the action to TIMESTAMP+SIGNATURE (as seen in the below code sample), at which point I got a new error message: "Timestamp must be signed". This I fixed by setting a custom SIGNATURE_PARTS property also as below.

Finally, once this was all done, and everything was signed, Amazon gave me back the message "AWS was not able to authenticate the request: access credentials are missing". This is exactly the same error that you get when nothing is signed at all, which needless to say is somewhat ambiguous.

At this point I decided that I'd really like to see what was being sent over the wire. The WSDL specifies the port address with an HTTPS URL. However, I had saved the WSDL locally, and changing the URL to HTTP made the result inspectable with the inestimable Wireshark. Despite the request being sent in HTTP, not HTTPS, it was still executed. According to the docs, this should not be!

Anyway, once I was looking at the bytes, I saw that the certificate was only being referred to, not included as specified in the AWS SOAP documents, in this case for SDB. This was fixed by setting the SIG_KEY_ID (key identifier type) property to "DirectReference", which includes the certificate in the request.

...and then it worked. Oh Frabjous Day, Callooh, Callay! The final testcase code that I used is more or less as follows:

package net.ex337.postgrec2.test;

import com.amazonaws.ec2.doc._2009_10_31.AmazonEC2;
import com.amazonaws.ec2.doc._2009_10_31.AmazonEC2PortType;
import com.amazonaws.ec2.doc._2009_10_31.DescribeInstancesType;
import junit.framework.TestCase;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import javax.security.auth.callback.Callback;
import javax.security.auth.callback.CallbackHandler;
import javax.security.auth.callback.UnsupportedCallbackException;
import org.apache.cxf.endpoint.Client;
import org.apache.cxf.frontend.ClientProxy;
import org.apache.cxf.ws.security.wss4j.WSS4JOutInterceptor;
import org.apache.ws.security.WSPasswordCallback;
import org.apache.ws.security.handler.WSHandlerConstants;

/**
*
* @author Ian
*
*/
public class Testcase_CXF_EC2 extends TestCase {

public void test_01_DescribeInstances() throws Exception {

AmazonEC2PortType port = new AmazonEC2().getAmazonEC2Port();

Client client = ClientProxy.getClient(port);
org.apache.cxf.endpoint.Endpoint cxfEndpoint = client.getEndpoint();

Map outProps = new HashMap();

//the order is important, apparently. Both must be present.
outProps.put(WSHandlerConstants.ACTION, WSHandlerConstants.TIMESTAMP+" "+WSHandlerConstants.SIGNATURE);
//this is the configuration that signs both the body and the timestamp
outProps.put(WSHandlerConstants.SIGNATURE_PARTS,
"{Element}{http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd}Timestamp;"+
"{}{http://schemas.xmlsoap.org/soap/envelope/}Body");

//alias, password & properties file for actual signature.
outProps.put(WSHandlerConstants.USER, "amaws");
outProps.put(WSHandlerConstants.PW_CALLBACK_CLASS, PasswordCallBackHandler.class.getName());
outProps.put(WSHandlerConstants.SIG_PROP_FILE, "client_sign.properties");

//necessary to include the certificate in the request
outProps.put(WSHandlerConstants.SIG_KEY_ID, "DirectReference");

cxfEndpoint.getOutInterceptors().add(new WSS4JOutInterceptor(new HashMap(outProps)));

//sample request.

DescribeInstancesType r = new DescribeInstancesType();

System.out.println(port.describeInstances(r));
}

//simple callback handler with the password.
public static class PasswordCallBackHandler implements CallbackHandler {
private Map passwords = new HashMap();

public PasswordCallBackHandler() {
passwords.put("amaws", "password");
}

@Override
public void handle(Callback[] callbacks) throws IOException, UnsupportedCallbackException {
for (int i = 0; i < pc =" (WSPasswordCallback)callbacks[i];" pass =" passwords.get(pc.getIdentifer());"

provider="org.apache.ws.security.components.crypto.Merlin" type="pkcs12" password="password" alias="amaws" file="aws.pkcs12" href="http://s3.amazonaws.com/ec2-downloads/ec2.wsdl">http://s3.amazonaws.com/ec2-downloads/ec2.wsdl.

[I think I mangled somethjing here, will fix it soon]

At this, the method signatures of the generated port abruptly changed to something other, because I forgot to change the wsdlLocation in the JXB binding file. Once I fixed this, it worked again.

Some thoughts:

1) Were I publishing a library for general use in accessing AWS, I would probably not use the direct "symlink" above that always points to the latest version of the WSDL. Instead, I would link deliberately to each version, and in that way always generate ports for each version of the WSDL, this ensuring backwards compatibility.

2) Secondly, I find it inelegant to have to specify the WSDL location in two places (the POM and the binding file), and so I'd like to try and pass the binding file through a filter, using a ${variable} in both places referring to a property in the POM.

3) I find it likewise confusing that the password for the keystore is used in two places, firstly in client_sign.properties and secondly in the CallbackHandler that is invoked from within the bowels of the WSS4JOutInterceptor. In the code above, this is obviously duplicated data, however in the final 'production' version of this code I expect to have the data centralised & the code prettified around it.

2009/12/20

Using CXF instead of Axis for Java from WSDL: better results faster.

In the footsteps of the same guy, Glen Mazza, I linked to at the bottom of the previous post, who did the same thing (SOAP client / server using WS-Security and X.509 certificates using CXF), I switched from Axis2 to CXF, and had immediately better results:

The documentation and maven plugin instructions is current, and accurate.
The plugin works.
All the right JAR files are in repos.
The code generation worked fine, with some JAX-WS binding stuff added into the mix.

Which leads me to ask, why are there two projects at Apache doing essentially identical things, right down to the usage patterns for the tools they provide? (A: CXF nee XFire is from Codehaus). Anyway, I don't have to write a HOW-TO for this stuff, the docs are there and they're useful.

I have yet to look at CXF support for WS-Security, but it seems simpler from the get-go than the equivalent stuff in Axis2, ~~despite insisting on Java's proprietary keystore~~, hum, I didn't read this howto clearly enough - files are supported. We shall see!

2009/12/19

Creating Java code from a WSDL using Apache Axis2: maven2 and the command line.

Four years ago, creating Java code from WSDL was difficult and annoying. Today, I'm trying to generate a client for the EC2 WSDL, that ideally would download the latest version and rebuild the API when I type "mvn clean install".

I've given up. The Axis2 Maven2 plugin does not seem to work correctly, so I've resorted to using the command-line tool, which does work. My command was:

wsdl2java.bat -d jaxbri -o -S . -R . --noBuildXML --noMessageReceiver -uri http://s3.amazonaws.com/ec2-downloads/2009-10-31.ec2.wsdl

I used the JAXBRI output format because XMLBeans is basically dormant. Unfortunately the JAXBRI compiler generates sources in an "src" subdirectory, which can't be changed via command-line options, so some manual copy-and-pasting is required.

Secondly, the generated classes then depend on axis. So this needs to be added to the POM:

(Blogger broke my XML):

org.apache.axis2
axis2
1.5.1

Thirdly, there's an undeclared dependency in Axis2-generated code on Axiom, so this also needs to be added:

org.apache.ws.commons.axiom
axiom
1.2.5

(The latest version is 1.2.8, but this doesn't seem to be in repos yet.)

Following which, attempting to run this:

AmazonEC2Stub stub = new AmazonEC2Stub("http://eu-west-1.ec2.amazonaws.com");
DescribeRegionsType t = new DescribeRegionsType();
System.out.println(stub.describeRegions(t).getRegionInfo().getItem());

...rendered many different ClassNotFoundExceptionS which were, one after another, which I attempted to solve shotgun-style by adding each new dependency to the POM as it cropped up. This was an abject failure - I stopped at org.apache.axis2.transport.local.LocalTransportSender, which apparently is only available in Axis2 v. 1.2 (I'm using 1.5.1). So instead I deleted all the Axis2-related stuff from my POM and just added the JARs from the 1.5.1 downloded ZIP file straight to the Eclipse project.

This worked, and gave me the error message that I was looking for, to whit: "AWS was not able to authenticate the request: access credentials are missing". From here, I would just need to get the Rampart/WS-Security/WSS4J stuff working properly with the Amazon X.509 certificate, and then I should be home free. We shall see.

Further light reading can be found on this article on IBM developer works and this article on SOAP clients with Axis.

Update 2009/12/20: The work has been done (I should have googled first!), and it is herculean, as you can see by reading this impressive tutorial for creating an Axis2 SOAP Client for the Amazon Product Advertising API.

2009/12/04

On the subject of being too harsh a critic

One of the great things about OSS is its transparency. This is also a great response to critics of any particular project, which is "instead of talking so much, why don't you shut up and help?".

I'm guilty of forgetting that there are real people behind most projects. With commercial software this happens a lot more and is disguised as "I'm a paying customer and I expect good service", but there's no excuse, honestly, for criticism that isn't phrased constructively and considerately when the product itself is free.

Fnar fnar fnar.

Easy Questions Are Boring

2010/11/25

'War Is a Force That Gives Us Meaning' by Chris Hedges

2010/11/17

Mobile telcos need to evolve.

2010/06/05

On synthetic genomics aka 'artificial life', and dire predictions.

2010/05/19

Jevons paradox, Moore's law and utility computing

2010/04/18

YAGNI and the boring stuff

2010/04/16

And now for messed up UK politics: The Digital Economy Act and other IP

The timing could not be better. Or worse, depending on your POV.

2010/02/15

The advantages of programming on a small netbook.

2010/02/06

EC2 and CXF: Serialising objects in JAX-WS

2010/02/04

EC2 and CXF / JAX-WS: Configuring WSDL endpoints & service URL

2010/01/28

EC2 Spot instances, latency

2009/12/27

Building a WS-Security enabled SOAP client in Maven2 to the EC2 WSDL using JAX-WS / CXF & WSS4J: tips & tricks

2009/12/20

Using CXF instead of Axis for Java from WSDL: better results faster.

2009/12/19

Creating Java code from a WSDL using Apache Axis2: maven2 and the command line.

2009/12/04

On the subject of being too harsh a critic

What I've read recently

Blog Archive

2010/11/25

2010/11/17

2010/06/05

2010/05/19

2010/04/18

2010/04/16

2010/02/15

2010/02/06

2010/02/04

2010/01/28

2009/12/27

2009/12/20

2009/12/19

2009/12/04

Subscribe To

What I've read recently

Blog Archive