Sign In/My Account | View Cart  
advertisement


Listen Print Discuss

This week the Deviant gives a quick update from the SML-DEV mailing list.

A Retrospective

The SML-DEV list was born in November of 1999, following a fairly heated XML-DEV debate about simplifying XML. Since then it's been involved in a number of debates about the complexity of XML specifications, including ideas about how the complexity might be mitigated with judicious subsetting and other forms of simplification.

The list's first deliverable was Common XML ("All the XML you need"), a selection of the useful aspects of the XML 1.0 specification. Common XML was generally well received, even in quarters where the idea of XML subsetting was previously anathema, no doubt largely due to it presentation as distilled best practice rather than overt simplification.

SML-DEV then moved swiftly on to MinML, a true XML subset that threw out everything except elements and simple text content. The MinML specification was debated long and hard, acquiring a checkered history even on SML-DEV. A particularly contested issue was whether attributes should or, indeed, could be usefully removed. This particular debate has still not gone away: Sjoerd Visscher summarized some issues in a recent paper, and James Clark recently commented that XML applications should minimize distinctions between the two styles of markup.

TREX tries to treat attributes and elements as uniformly as possible. If you're designing an XML or SGML markup language, it's often pretty much arbitrary whether you represent some bit of information as an attribute or as a child element. In my view, XML processing tools and languages should try to minimize the differences between elements and attributes and should try to treat them as uniformly as possible. You can see that in XSLT and XPath. I wanted to apply that idea to schema languages.

The MinML specification was successfully completed, however, and a number of MinML parsers have since appeared. One of the most interesting findings from the MinML experiment is that adding more features -- attributes, DTDs, and so on -- increases neither parser implementation complexity nor, with care, the associated data model.

SML-DEV, relatively quiet for a time, eventually snowballed into the definition of YAML ("Yet Another Markup Language") which "broke free" from XML by throwing out pointy bracketed syntax entirely. YAML is aimed at a much smaller problem space, data serialization, particularly for Perl and Python applications.

SML-DEV has tried three different approaches to date: profiling XML through collation of best practice, subsetting XML by paring it to the bone, and providing an XML competitor for a specific application area. What next?

SML-DEV has begun to show signs of activity again, perhaps prompted by the recent debates calling for refactoring XML to layer and expose the dependencies between its specifications more cleanly, as well as guidance for developers toward the core XML technologies they really need. Michael Champion's (a long time SML-DEV contributor) own article "Daring to Do Less With XML" contains further sound advice on these issues.

Doing It Differently

Comment on this article Do you think XML has grown too complicated? Is it time for a mass refactoring?
Post your comments

Joe Lapp posted to SML-DEV this week wondering if anyone would be interested in creating "what XML should have been". Lapp listed various reasons for not wishing to build directly on either MinML or YAML, suggesting instead that work begin on another alternative.

If anybody out there is interested in creating an alternative to XML that is incompatible with XML, I think it would be wise for us to start by tackling the niches in which XML does poorly...

Of course, I'd still want to be able to use the new language where I'd otherwise be inclined to use XML, since a big motivation for me is to have an excuse to use something other than XML. Another big motivation for me is the prospect that the little guys could ultimately generate enough momentum to overthrow much of XML. (This last sentence only makes sense if you understand that the complexity of XML and its ties to the W3C necessarily make it primarily a game for deep pockets and long horizons.)

A slightly surprising tack given that Common XML is likely to be SML-DEV's most successful venture. Is creating another markup syntax really necessary? Michael Champion, who has previously commented that the Common XML/Best Practice approach to simplification is the correct tack, seems to be convinced that a real alternative is the only option.

Interesting timing ... I'm having a "why oh why oh why did I ever get mixed up in this XML $#!+" kind of day. I can't talk about the details because of the W3C creed of Omerta, but suffice it to say that the little inconsistencies between the data models of the extended XML specifications (DOM, XPath, XSL, XQuery, the PSVI, the InfoSet, ad nauseum) are slowing W3C progress to a crawl. The solution of breaking a few things, radically simplifying, and starting over is not even politely listened to in W3C circles. Godfather Darwin is going to be taking XML (broadly defined) out to a landfill in 'Jersey before long. The only question in my mind is whether some other reasonably open markup language takes its place (SGML-lite? an ISO or OASIS-defined XML subset? An ad-hoc semi-standardized XML subset that everyone embraces and extends?) or whether we go back to the Bad Ol' Days of proprietary "post-XML" formats and tools.

So, I'm interested, but utterly stymied as to the politics/business model of some XML simplification. The W3C is beyond hope as a venue for simplification, and no other existing organization shows any interest either (well, maybe OASIS and RELAX-NG ...)

In a subsequent post Joe Lapp noted that success depends on defining the particular problem area at which an alternative might be targeted.

If we decide that we need a good serialization language, well, then we would probably end up with something close to YAML. If we decide we need a good way to mark up human verbiage, well, then we would probably end up with something like Paul's PXML. If we decided that we need a language that does a little something for everybody, well, then we would probably end up with XML.

I think our success hinges critically on us picking a problem or a set of problems to solve and then finding a darn good solution.

The message continues by outlining Lapp's current thinking, which is actually a slightly tweaked version of MinML. Taking an alternate tack, Tom Bradford believed that Common XML would actually be a better starting point.

I think a subset of XML and related technologies that refused to acknowledge external dependencies, used a non brain-dead namespacing mechanism, and reduced the amount of interdependencies, as well as the number of overlapping and duplicated features would be a good start. There's Common XML, which I'm inclined to think is probably the best foundation with which to define such a beast.

Bradford has recently published two papers which are further fuel to the simplification debate. The first, "Clean Namespaces", suggests naming patterns to qualify XML elements. The second, "The Future of XML" is a brief rant on the state of XML development, concluding that the "grand vision" of XML will crumble because the

rate of adoption of XML by entities who actually need it to solve problems is inversely proportional to the complexity of XML as applied to those problems. The more specification interdependencies, and the more complex the specifications being released by the W3C, the worse the rate of adoption for XML will become. We're basically heading for SGML all over again.

Don Park, the founder of SML-DEV, also admitted bad feelings about the direction in which XML is going.

I still like XML, but have bad feelings about where it is going. XML's relationship with W3C is both a blessing and a curse. They took our acceptance of XML as open invitation to shove whatever standard they approve down our throat. Whatever W3C does is tinted with politics and compromises between document vs. data use, verbosity vs. brevity, etc. W3C is like Washington D.C.

Also in XML-Deviant

The More Things Change

Agile XML

Composition

Apple Watch

Life After Ajax?

...I do see that there is a rising sea of frustrations. Whether there are enough fuel to reach escape velocity out of this global gravity-well called XML, your guess is as good as mine.

The problem with simplification is deciding which bits to leave out. Once you begin to move away from the core specifications, people's requirements begin to rapidly diverge. Cut out too much and you may well disenfranchise a large part of your potential user base. A well-targeted utility language may well be successful, but it's likely to remain in a niche area. Investments already made in XML and related technologies aren't going to be readily squandered either. The rapid success of XML may be inadvertently perpetuating an "if you build it, they will come" illusion. Reality is far harsher.

Sharing best practices is the only viable option. The real concern over simplification is that the temptation to start over from first principles could well limit progress.


Comment on this articleDo you think XML has grown too complicated? Is it time for a mass refactoring?
(* You must be a
member of XML.com to use this feature.)
Comment on this Article


Titles Only Titles Only Oldest First
  • XML should be simpler
    2001-08-02 07:40:12 Charles Dowdell [Reply]

    Yes, It has gotten too complex. Simplicity is power. Once the Object Oriented programmers got the ball they seem to wanting to turn XML into C++.


    XML should be representative of information science and based on the natural thought of human kind. It should not be a slice of human thought as viewed by programmers. It should have direct connections to the long established philosophies of human consciousness.

  • XML solid at core, flaky around the edges
    2001-08-02 10:51:20 Michael Champion [Reply]

    I seem to be come up with the most quoteable turns of phrase when I'm being the least rational and attentive to detail. For the record, I believe that the XML *superstructure* -- schemas, different conceptions of namespaces,infosets, datatypes, committee-defined formalisms being defined by a number of groups at the W3C -- will NOT survive the evolutionary pressures of the real world. The *infrastructure* of XML -- the Common XML subset of XML 1.0, the parts of the DOM API that expose Common XML, the ideas that are common to the W3C Schema, RELAX, and TREX -- is as solid as almost anything in the software industry.


    The solid ideas at the core will survive even if the W3C loses credibility as a "standards" body, if much-hyped ideas such as the semantic web or web services fail to pay off, etc. Probably the worst that could happen after "XML" writ large is whacked is something like the Unix world in the late '80's/early '90's: everyone more or less agrees on some core concepts but "embraces and extends" them in a confusing, but more or less workable way.


    We can do a whole lot better than that ... maybe something like the XML equivalent of Linux will come along to re-unite everything in a single vision and re-ignite the energy ... but that's not the current trend.




  • Simplifying XML
    2001-08-02 21:38:25 Michael Cirovic [Reply]

    While it is at least partially true that the W3C has gone overboard in the complexity of XML and realted specs, it seems to me that it should not deter anyone from adopting XML to their particular application. Use what you need and the remainder can be totally untouched...

  • XML was good when it was simple.
    2001-08-03 05:49:35 Andrew Yohn [Reply]

    Our development shop uses XML as our core technology platform. Because we developed our base prior to XSD, most of our technology uses XDR. In using XDR, we were able to define most of our modeling needs, but alas there were some instances in our data specification that couldn't be expressed well with XSD. So we just dealt with the issue as best we could, delivered product and waited with anticipation for XSD to be released. But when the XSD specification was released, we were overwhelmed with the complexity and all the stuff we did not need. The additional complexity of XSD is another instance of 'we did it because we could' as opposed to something easily understood and used by the masses. XSD is just an example of the complexity being imposed on the XML community that does not need to be there. XML became the industry mantra because it was (intially) so simple and obvious. We must be careful to not do things 'just because we can', let's make sure we should.

  • Intrigued yet confused and overwhelmed.
    2001-08-09 09:45:20 Israel Evans [Reply]

    As highlighted in many of the articles here, I am the typical newbie to XML who knows that there is something really good here, but is overwhelmed by the sheer volume and cryptic quality of the many w3c specifications.


    I try to follow best practices and adhere to the standards, which is why I've switched over to XHTML, but attempting to use XML, I'm confused. Figuring out which spec to use with other specs, which are dependant on others, which are optional, and more importantly why would I need some of this.


    I use Python, but wonder if using the XSL/t/fo or whatever would be preferable, but since I don't really understand all of them yet it's hard to say. If there where something in the specs that marked out the typical path of adoption and problem realm that each spec covered and how they were related, it might make thing easier to understand.


    Anyway, my brain hurts..
    ~elmlish~