|
a r t i c l e s  >
x m l s t i l l s u c k s
An interesting article was recently mentioned on Slashdot, regarding the relative suckiness of XML. It pointed to an article by Tim Bray, one of the co-creators of XML. It was mostly about how XML is too hard for programmers to use effectively. It was pretty good... However, it was followed up by another article arguing the converse - that despite XML's fundamental suckiness, the benefits clearly outweigh the costs. Following is a point-by-point retort to his arguments. And, if you're interested, you may take a look at a pet project of mine which is an alternative to XML. Its still a work in progress, but I feel that I have no right to bash XML unless I offer an alternative. XML Has Internationalization Pretty Well Nailed Excuse me?! Obviously Tim Bray has never has to set up a server that has to parse SOAP requests from both Japanese and Korean clients. This kind of statement can only be made if you believe that the Unicode Group has come even remotely close to solving the various problems involved in unifying all the thousand of native encodings into one single universal set. They fell short by about a factor of ten, and then claimed the problem was solved. Not much of a solution if you ask me. But that is a rant for another day. Most importantly, the only thing XML did to 'nail' the internationalization problem was to require a header at the top specifying the encoding. Huh... with that logic both MIME and HTML have 'nailed' the problem well beforehand. And dont forget that the encoding attribute is an OPTIONAL parameter. Not to mention that some applications, NotePad in particular, try to always save in the native OS file encoding whenever they can. If you are hand-editing an XML file, which claims its encoding is UTF8, your beautiful XML file is now quite broken. The internationalization problem will never be 'nailed' until each file contains a magic header string which will be encoded uniquely for every supported file encoding. My alternative to XML addresses this issue further. XML Can Represent Pretty Well Anything Arguably, any Turing Complete computer language can parse any kind of file to represent any kind of data structure. The question is not can XML represent any data structure, but can it represent the majority of common data structures in a manner that is efficient, and easilly parsed by both machines and humans? No. The four most common data structures are probably arrays, tables, hashtables, and maps. XML does hashtables and maps fairly well, because its is very heirarchical in nature. However, arrays and most database tables look extremely contrived in an XML file. They are phenominally verbose, and very hard to read. A simple text file can do arrays and tables extremely well, and can do a decent job on hashtables with name-value pairs. Maps can be represented with a combination of name-vale pairs, and tables. Some will claim that its difficult to see relationships in trees and maps in a text file, and easier to see them in XML. I agree entirely! However, if you have an XML file with any large amount of nesting, you'll need some kind of tool to parse it to see the structure. If you are already taking this step, why not use a similar tool for your text file? Like grep? Not to mention every single example he gives, EVERY SINGLE ONE, can be even more easilly represented by any variety of tab delimited text file formats. And text files can be parsed with a simple shell script or awk, without the need of a multi megabyte XML parsing library. XML Forces Syntax-Level Interoperability Just because two applications both know XML is absolutely no guarantee that they will be able to understand each other. XML doesn't make anything more interopable - it just shifts the effort of making two applications communicate to a different layer. Instead of text files being the ubiquitous bottom rung, now they are XML files. Its better than binary data, but what really does it have over a formatted text file? When SOAP was new, about a dozen businesses offered SOAP streams of stock quotes. None of which were compatible with each other. Why not? The terminology of the stock market is a hundred years old, and if everybody is using SOAP, why didn't things 'just work'? Because they solved the wrong problem. The easy problem is getting people to agree that one format is a good idea. Due to its hype, XML seems a good choice for a universal data format. But the hard part is getting them to agree to a schema for their data. Obviously, getting everybody to agree to XML is a step, but its no better than comma delimited text files. And the hard part is still to come. Especially if they they are forced to use the W3C XML Schemas kludge to try to describe their XML. XML Supports Constructive Finger-Pointing Perhaps, but no more so than any other kind of validated data. Remember, that in order to make a robust, flexible, and secure system, you will have to do data validation ANYWAY. SOAP schemas in WSDLs, along with DTDs can give hints on how the data should look, but even if its totally valid XML, that doesn't mean the data is valid. Dates can be invalid. Parameters may be too long. People may be trying to attack you with denial-of-service tricks. A good program will have to validate this stuff AGAIN anyway. If it does so, then the finger will still be pointed at the culprit of the bad data. XML Confers Longevity Perhaps, but at what cost? True, it is a huge problem to save important data in a proprietary binary format. XML does give you ONE option - you can save it to an extremely verbose text based format, which can then be reread by future applications. However, lets face it...XML and its associated transformation language XSLT is so verbose and complex, that you're not much better off. Its probably much easier to save your data to an open binary format, such as rich text, PostScript, or TeX. Need I remind people of why SGML never caught on? Well, what if you dont need tons of professional typesetting and formatting? What if your data just has a few images, tables, and text? Great! Use HTML and CSS. Now you never need the intermediate transformation layer at all. Complaint: XML is Verbose Very true... however his retort is not the whole story. If bandwidth is a problem, you can compress XML, but how does this help the poor user who has to parse 100k XML files all day long, looking for a specific node? How does this help the poor guy who has to write those files by hand? Complaint: XML Does What S-Expressions and CSV Already Could Again, the best argument that he has why XML is better than comma seperated text files is internationalization. XML has it nailed, CSV didn't event try. As I said before, XML has it far from nailed, and all it did was add a header. Is this the best argument the co-founder of XML has to get people to stop using comma seperated lists? That's quite interesting... Complaint: XML Has Both Elements and Attributes, Why? Now this REALLY irks me about XML. I have never seen a single cogent explination about why something should be an attribute instead of an element. Most people seem to make something an attribute to make XML less verbose... in which case if it gets too long, why not just make it an element now? Because XML treats them differently. why why why why why why?! Every best practice of elements vs attributes Ive seen seem to be in violation of every HTML page in existance. No help there. Id have so much more respect for these people if they could just admit that it was an arbitrary and bad decision, but its too late to change, so live with it. We're hackers. We're good with working with bad technology. Complaint: Mixed Content Sucks The issue here is that its very handy to be able to nest data inline with other kinds of data. His example is very apt - how else can you represent a hyperlink inside a paragraph without mixed content? Complaint: XML is Both a Tree and a Sequence I agree fully with his response. Data is messy. Live with it. Complaint: There Are Ugly Complex Standards Built on XML Again, I agree fully with the complaint, and his response. Just because a bunch of overly acedemic ivory tower wierdos built something unwieldy out of XML, doesn't necessarilly mean that XML itself is bad. Remember, XML doesn't kill people, the W3C kills people ;) |