Showing posts with label aws. Show all posts
Showing posts with label aws. Show all posts

2010/12/08

Thoughts on WikiLeaks

  1. We should be more outraged at the malfeasance revealed than at the manner in which we learnt of it. That we aren't, speaks to our cynicism and the low-to-nonexistent standards to which we hold government.

  2. It was EveryDNS, not EasyDNS.

  3. Much as I love Amazon AWS (see many, many previous posts), their justification for booting out WL is disappointing, because (a) the cause (Lieberman's call) and effect is there for all to see, and (b) they didn't boot WL off because of the Iraq or Afghanistan war logs, which were also violations of the TOS, six months ago. Besides which, they have a business interest in a robust first amendment - they sell books, including dangerous and subversive ones containing state secrets! This isn't hard people!

  4. As Tim Bray said, the spinelessness of the IT industry in general is depressing when one considers that we should have "freedom of speech" in our DNA.

  5. The following ideas are not contradictory:
  • Cablegate is not necessarily a good thing
  • The way in which various governments and their officials have responded to it is an attack on the freedom of information and on the free press.
  • Julian Assange can be (a) a scumbag rapist, (b) justifiably paranoid, (c) a raging egotist and (d) doing really important work, all at the same time.
Good thinkers on the subject include Clay Shirky and Glenn Greenwald.

2009/12/27

Building a WS-Security enabled SOAP client in Maven2 to the EC2 WSDL using JAX-WS / CXF & WSS4J: tips & tricks

Generating a Java client from the Amazon EC2 WSDL that correctly used WS-Security is not completely simple. This blog post from Glen Mazza contains pretty much all the info you need, but as usual there are many things to trip up over along the way. So, without further ado, my contribution.

My setup: I was using Maven2 to construct a JAR file. Running "mvn generate-sources", then, downloads the WSDL and uses it to generate the EC2 object model in src/main/java.

Blogger doesn't like me quoting XML, so I've put my sample POM at pastebin, here. Inside the cxf-codegen-plugin plugin XML you'll see two specific options, "autoNameResolution", which is needed to prevent naming conflicts with the WSDL, and a link to the JXB binding file for JAXWS, which is needed to generate the correct method signatures

Once this is done, then the security credentials need to be configured. There are some pecularities:

As laid out in this tutorial for the Amazon product advertising API, the X.509 certificate and the private key need to be converted into a pkcs12 -format file before they're usable in Java. This is done using OpenSSL:
openssl pkcs12 -export -name amaws -out aws.pkcs12 -in cert-BLABLABLA.pem -inkey pk-BLABLABLA.pem
At this point, I should admit that I spent hours scratching my head because the generated client (see below) gave me the error "java.io.IOException: DER length more than 4 bytes" when trying to read the PKCS12 file. So I switched to the Java Keystore format by using this command (JDK6 format):
keytool -v -importkeystore -srckeystore aws.pkcs12 -srcstoretype pkcs12 -srcalias amaws -srcstorepass password -deststoretype jks -deststorepass password -destkeystore keystore.jks
...and then received the error "java.io.IOException: Invalid keystore format" instead. At this point I googled a bit, and discovered two ways to verify the integrity of keystores, via openSSL and the Java keytool:
#for pkcs12
openssl pkcs12 -in aws.pkcs12 -info

#for keystore
keytool -v -list -storetype jks -keystore keystore.jks
Both the keystore and pkcs12 file were valid. Then, I realised that I'd put the files in src/test/resources which was being put through a filter before landing in "target". The filter was doing something to the files, so of course they couldn't be read properly. Duh me. I put the key material in a dedicated folder with no filtering and this problem was fixed.

My next problem was the exception "java.io.IOException: exception decrypting data - java.security.InvalidKeyException: Illegal key size". This was solved by downloading the "Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files". Simple!

At this point the request was being sent to Amazon! Which then returned a new error message, "Security Header Element is missing the timestamp element". This was because the request didn't have a timestamp. So, I changed the action to TIMESTAMP+SIGNATURE (as seen in the below code sample), at which point I got a new error message: "Timestamp must be signed". This I fixed by setting a custom SIGNATURE_PARTS property also as below.

Finally, once this was all done, and everything was signed, Amazon gave me back the message "AWS was not able to authenticate the request: access credentials are missing". This is exactly the same error that you get when nothing is signed at all, which needless to say is somewhat ambiguous.

At this point I decided that I'd really like to see what was being sent over the wire. The WSDL specifies the port address with an HTTPS URL. However, I had saved the WSDL locally, and changing the URL to HTTP made the result inspectable with the inestimable Wireshark. Despite the request being sent in HTTP, not HTTPS, it was still executed. According to the docs, this should not be!

Anyway, once I was looking at the bytes, I saw that the certificate was only being referred to, not included as specified in the AWS SOAP documents, in this case for SDB. This was fixed by setting the SIG_KEY_ID (key identifier type) property to "DirectReference", which includes the certificate in the request.

...and then it worked. Oh Frabjous Day, Callooh, Callay! The final testcase code that I used is more or less as follows:

package net.ex337.postgrec2.test;

import com.amazonaws.ec2.doc._2009_10_31.AmazonEC2;
import com.amazonaws.ec2.doc._2009_10_31.AmazonEC2PortType;
import com.amazonaws.ec2.doc._2009_10_31.DescribeInstancesType;
import junit.framework.TestCase;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import javax.security.auth.callback.Callback;
import javax.security.auth.callback.CallbackHandler;
import javax.security.auth.callback.UnsupportedCallbackException;
import org.apache.cxf.endpoint.Client;
import org.apache.cxf.frontend.ClientProxy;
import org.apache.cxf.ws.security.wss4j.WSS4JOutInterceptor;
import org.apache.ws.security.WSPasswordCallback;
import org.apache.ws.security.handler.WSHandlerConstants;

/**
*
* @author Ian
*
*/
public class Testcase_CXF_EC2 extends TestCase {

public void test_01_DescribeInstances() throws Exception {

AmazonEC2PortType port = new AmazonEC2().getAmazonEC2Port();

Client client = ClientProxy.getClient(port);
org.apache.cxf.endpoint.Endpoint cxfEndpoint = client.getEndpoint();

Map outProps = new HashMap();

//the order is important, apparently. Both must be present.
outProps.put(WSHandlerConstants.ACTION, WSHandlerConstants.TIMESTAMP+" "+WSHandlerConstants.SIGNATURE);
//this is the configuration that signs both the body and the timestamp
outProps.put(WSHandlerConstants.SIGNATURE_PARTS,
"{Element}{http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd}Timestamp;"+
"{}{http://schemas.xmlsoap.org/soap/envelope/}Body");

//alias, password & properties file for actual signature.
outProps.put(WSHandlerConstants.USER, "amaws");
outProps.put(WSHandlerConstants.PW_CALLBACK_CLASS, PasswordCallBackHandler.class.getName());
outProps.put(WSHandlerConstants.SIG_PROP_FILE, "client_sign.properties");

//necessary to include the certificate in the request
outProps.put(WSHandlerConstants.SIG_KEY_ID, "DirectReference");

cxfEndpoint.getOutInterceptors().add(new WSS4JOutInterceptor(new HashMap(outProps)));

//sample request.

DescribeInstancesType r = new DescribeInstancesType();

System.out.println(port.describeInstances(r));
}

//simple callback handler with the password.
public static class PasswordCallBackHandler implements CallbackHandler {
private Map passwords = new HashMap();

public PasswordCallBackHandler() {
passwords.put("amaws", "password");
}

@Override
public void handle(Callback[] callbacks) throws IOException, UnsupportedCallbackException {
for (int i = 0; i < pc =" (WSPasswordCallback)callbacks[i];" pass =" passwords.get(pc.getIdentifer());"

provider="org.apache.ws.security.components.crypto.Merlin" type="pkcs12" password="password" alias="amaws" file="aws.pkcs12" href="http://s3.amazonaws.com/ec2-downloads/ec2.wsdl">http://s3.amazonaws.com/ec2-downloads/ec2.wsdl.

[I think I mangled somethjing here, will fix it soon]

At this, the method signatures of the generated port abruptly changed to something other, because I forgot to change the wsdlLocation in the JXB binding file. Once I fixed this, it worked again.

Some thoughts:

1) Were I publishing a library for general use in accessing AWS, I would probably not use the direct "symlink" above that always points to the latest version of the WSDL. Instead, I would link deliberately to each version, and in that way always generate ports for each version of the WSDL, this ensuring backwards compatibility.

2) Secondly, I find it inelegant to have to specify the WSDL location in two places (the POM and the binding file), and so I'd like to try and pass the binding file through a filter, using a ${variable} in both places referring to a property in the POM.

3) I find it likewise confusing that the password for the keystore is used in two places, firstly in client_sign.properties and secondly in the CallbackHandler that is invoked from within the bowels of the WSS4JOutInterceptor. In the code above, this is obviously duplicated data, however in the final 'production' version of this code I expect to have the data centralised & the code prettified around it.

2009/11/14

Using the EC2 API: console output blank, connection refused, socket timeout, etc.

Hello there. As usual, there's a world of difference between the conceptual usage of an API and it's real-world, practical stuff. In my adventures I'm stubbornly, block-headedly not interested in using EC2 via anything except the API, i.e. no command-line tools or management console (except for debugging), and so, I intend to be able to create my images etc. in a test-harness. For reasons to be enumerated anon. Anyway, the following stuff may be useful to people writing code that uses the EC2 API for the first time:
When booting an instance, it is not assumed that once it is "running", that SSH will be serving on port 22, neither can it be assumed that the console output is there. So if you want to SSH into your instance, first poll the instance state, and once it's "running", then poll on console output. Once it's there, it's complete, so you can retrieve the fingerprints and go on from there.
This is good to know of course if one wants to sling instances around, but I find it slightly incongruous is that I'm being charged for about a minute of time on a machine I can't access yet. Of course, from Amazon's perspective the instant (har) I'm blocking a slot on a server, then it's chargeable, so it makes sense from their perspective I guess.

2009/10/16

EC2: now having actually played with it a *little*...

So, in place of my previous bloviation on the subject, unfettered by the weight of experience, a couple of somewhat-more-tempered comments follow:
  • Using the command-line tools is slow. They're shifting gigs of data around at the touch of a button, but hey, it's a UX thing.
  • In terms of actual tools, the options seem to be:
    • Go with the command-line tools and a bunch of bash scripts
    • Go with a (generally) half-baked third-party API, with its own idiosyncrasies built in, and the traditional lack of documentation OSS projects feel they can get away with.
    • (My inevitable option) download the WSDLs & use something to generate your own API in whatever language. Regenerate it whenever the API changes.
  • This choice is especially acute since I'm not intending, ultimately, to have to do anything by hand - so programming things properly to start with seems like the only sensible option.
  • Consistent IO on EBS is apparently not an option. This is something I think Amazon should fix toute suite, because things like RackSpace (maybe) and NewServers (h.t. etbe) seem to be to stomping all over the EBS I/O figures. In a different context, James Hamilton says "it makes no sense to allow a lower cost component impose constraints on the optimization of a higher cost component", and assuming that the servers are the expensive part, this is what (IMHO) may make using RDBMSs on EC2 a bit of a PIA long-term.

2008/09/24

State of play in EC2-based database hosting

Oracle recently blinked and decided to support their DB and some other stuff on EC2. Reading the actual terms though, assuming I've understood them correctly, they haven't actually done anything other than map EC2 virtual cores onto CPU Sockets, and let the normal rates apply. That's IT. What this means is that running one Oracle server for 100 hours is still 100 times more expensive (for licensing costs) than running 100 Oracle servers for one hour.

That's not cloud licensing. Cloud computing works on the premise that whether you use one, 10, or 100 CPUs, you pay per CPU-hour, no more no less. That's what DevPay does. The only problem is that in this scenario, software is a commodity, which I imagine doesn't sit too well with Oracle.

Virtualisation has been around for decades, but only once FLOSS commoditized the server operating system 'ecosystem' did it become possible to do things on the scale that Amazon are doing. Back in 2005 I had a Linux VM with the inestimable Bytemark, and it was plain for all with eyes to see that virtualisation was going to pull the floor out the bottom of the server hosting market once players had found the right way to leverage the economies of scale. Right now, that's being done behind closed doors by the big players, but Amazon are the first to have thrown open the doors to the unwashed masses, and that's why I like them so much.

(To repeat, I don't own stock - maybe I should :-)

To get back to the point. How do databases fare on Amazon EC2? Given that EBS has only been around for a couple of weeks, and before that, on EC2, DB hosting was risking everything to a block device that could go *poof* at any moment, which wasn't exactly pleasant.

This is something which will remain up in the air until someone with serious [PostGre/My]SQL-fu takes some AMIs, configures them just so, and benchmarks them. We know, right now, that on a small instance, disk throughput tops out at roughly 100 MB/s on a three-volume RAID 0 setup. I'm interested in seeing the speeds for EC2 and EBS on larger instances.

Moving on from pure throughput, how do PostGreSQL & MySQL stack up on these setups? Do their respective caching mechanisms etc. work with or against this strange new environment? Enquiring minds want to know!

2008/08/30

Thoughts on Amazon EC2/EBS and the "cloud computing" bandwagon

Amazon EBS is something that I've been waiting for ever since EC2 was announced. Dare Obasanjo rightly pegged it as the final piece of the puzzle.

Right now, "cloud computing" is the buzzword of the moment. This doesn't help create clarity when discussing exactly what it is. So, my definition is:
  • an API to dynamically start, stop & manage instances
  • per-CPU-hour and per-GB billing
This is the epitomy of "computing as a resource". Right now, plenty of people are offering cloud-computing-backed solutions, such as Google AppEngine, Joyent and Rackspace/Mosso. Other people have different parts of the puzzle - for example, 3Tera seem to have imaging sorted out. I'm sure I've missed other companies.

But nobody has what Amazon has. Want 500 servers for an hour? There's no place else to go but Amazon. So, despite all the hype about cloud computing, right now there's only one real market player that has a developed, mature product, and that is Amazon. Nobody else even comes close, and that includes Google.

(And no, I don't own shares.)

I was reading an article about how MS are basically building their new datacenters around shipping containers full of server kit that were constructed directly by manufacturers in China or thereabouts. I can almost guess they've built a standard umbilical cable & docking mechanism for the containers for power, bandwidth & airco. Roboticise the docking/undocking, then all you'd have to do is have a small control center and a foreman to operate the gantry.

To get all hand-wavey, sci-fi and "thereof I cannot speak with clue" for a minute, assuming the containers are airtight, and given that they'll never be open to humans until it's time to scrap/recycle them, why not look at using CO2 as a coolant instead of standard air conditioning? Automatic fire suppression comes free. And given that CO2 is still a gas at -70°C, why not overclock your CPUs to increase your ROI (of course, the energy you spend on cooling and the reduced lifespan of your CPUs due to overclocking is an opposite factor).

If you scrub the CO2 from the atmosphere, and dispose of it safely, you may even make your datacenter carbon neutral and reap tax credits as an additional benefit... (coming soon to a cognizant country near you.)

Fna fna fna.