Building APIs & object models for the rest of the web

OK, so XSD defines data structures, and the data structures <-> object debate is an old tarpit I'm avoiding for as long as possible. Neither is it news that schema-compliant structures can be produced by applying XSLT to XML.

However, why not produce Objects by applying ECMAScripts to HTML pages? One function defines a class and scrapes a document to extract the instance data from a web-page, a little like GreaseMonkey. Document hyperlinks become typeless references. Document forms define operations.

The rationale is behind this is that a lot of websites (e.g. Wikipedia) are broadly similar in markup structure to the extent that screen-scraping them is really, really easy (ignoring the API that obviates the need for that in this case and others).

The use case is anything that needs objects provided by 'web services' - definitions from dictionary.com, search results from Google, bla bla bla. In time such a system could theoretically provide models and API's to large portions of the web.

Of course, it's much easier in a scripting language that supports dynamic type construction. Also, someone else has definitely thought of this before.

No comments: