XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Simplified Markup Language: Your Responses

December 01, 1999

Last week we published an article, Simpletons vs. DocHeads, by Robert La Quey, introducing the move to create a Simplified Markup Language (SML). The proponents of SML want to lose the document-centric baggage of XML to create a simpler, more data-centric language.

We invited you to share your views in our forums, which has resulted in some intelligent and cogent responses. Additionally, the article sparked a further thread of discussion on the XML-dev mailing list, particularly with respect to the omission of attributes from SML.

In this article, I'll present some of the main points of view raised in the feedback we received.

Reactions to XML complexity

There have been varied reactions to the problem of the complexity of some parts of XML. Some developers simply don't use what they don't need -- Mark Shepherd shared an experience with XML development of the sort which prompted the formation of the SML idea:

For the last year and a half we have been using a subset of XML (no attributes, no processing instructions, etc.) for sending structured messages in a web-based client/server application. It does only one thing, but it does it pretty well and we have found many uses for our little parser, which in C++ is only a few hundred lines of code.

Antony Alappatt, who found XML more complex than he first thought, was in favor of questioning unneeded complexity:

I am a developer of enterprise software, what attracted me to XML was simplicity. The data description was right there in the message... But as I dug deeper into the XML language it became more complex... So it is a good idea that people are asking questions why XML should have such a complex grammar.

Antony's experience chimes with that of numerous developers who have been attracted by the hype around XML, yet were confronted with confusing legacies from SGML which can be totally alien to their problem domain.

Several people were of the opinion that poor developer education was to blame, not XML itself. Robert Morris wrote:

The confusion you describe ... can be attributed to the multitude of poorly, hurriedly written books in the market. In any introductory programming text a good author will omit difficult, advanced, or obscure features. Advancing this as an argument for eliminating those features from the language is spurious. We might as well argue that there should be no Java Beans for Java because they are hard to understand, not used in a wide range of programs, and omitted from introductory books.

His point about developer education was reinforced by James Humphrey, who commented:

My opinion is that we need one XML. However, we need a simplified learning mechanism. A book or a self study CD tutorial would be great. I read all of the XML correspondence, but I still do not feel I know an easy way to learn XML, from simple to complex.

Does SML throw out too much?

Rick Jelliffe certainly thought so. He wrote a considered response, covering attributes, processing instructions and internationalization. He rebuts the claim that there is no essential difference between attributes and elements:

The claim that there is no difference between attributes and elements misses the key point that in many processing languages, elements "push" the program along, while attributes are "pulled" in (to use James Clark's terms).

This is an important and practical distinction. It allows programs to work without testing every information unit.

On the XML-dev mailing list, James Tauber noted his disagreement with the Kent Siever's assertion quoted in last week's article that attributes were a mistake since SGML. Tauber agreed that for tree-like data structures, attributes could be replaced by elements, yet this was only one application area:

But if XML is used for markup, attributes make sense and should not be replaced by child elements. Why? Because in markup there is a distinction between content and markup. The character data content of an element is content. The value of an attribute is markup. Attributes, like other markup, provide information in addition to the textual content.

The underlying argument here is that XML as-is can be used in different ways, dependent on the application. There need not be varying XML specifications for each application, just a different utilization of the features XML can provide.

While attributes are certainly the most radical of omissions in SML, internationalization is a big issue as well, over which several debates have been running on XML-dev. Rick Jelliffe again:

Where I am, here in Taiwan, the main question people ask is "how do I represent a document in Big5 in XML?". So moving to ASCII or even to only UTF-8 will make SML into a US-only or Western-only language. The simplifications proposed so far seem a gigantic step backwards away from a "World Wide Web" and back 20 (or even 5?) years to a world where rich white countries developed technology which created a technological poverty in non-Western countries.

Clarity of SML's Aims

Walter Lounsbery commented that last week's article wasn't very clear on what SML actually is. Since the article's publication, Don Park has released a preliminary grammar for SML, which fixes the SML proposal more concretely.

Walter went on to raise a useful point on the issue of complexity:

The nature of XML is just a foundation of any level of actual complexity in the transmission implementation. So XML enables complexity, not simplicity. In this light, SML may be just a simpler way to increase the actual complexity of the transmitted data.

I think the topic deserves more discussion focusing on actual proposals, with examples of how the end product is simpler in a good way. Maybe the tags get easier, but the implementation is hard to read.

In a similar vein on XML-dev, Rick Jelliffe challenged the SML developers to provide failure cases for XML in order to better drive the SML specification:

Since you are not asking for use-cases, the only logic driving SML is reductionism. The result can only be a language with 1 encoding and 1 tag and the minimum repertoire for names.

If people are serious, they should first establish some use-cases of where XML fails. This will also bring out the ramifications of simplification.

Summary

There certainly appears to be sufficient support for SML at the moment among XML developers for it to be pushed forward to maturity. At that point, when a final specification is released, it will either reach wider acceptance, or die quietly. As Don Park himself put it on XML-dev in response to rumors of disquiet among "some people of influence":

Just remember that not all things are controllable and not all people can be influenced. Killing all the butterflies in Peking will not rid Kansas of tornadoes. Let the butterfly be and hope it drops dead on its own in the winter.

Doubtless the acid test for SML will be that of time.