Sign In/My Account | View Cart  
advertisement


Listen Print Discuss

One of the things that has made XML famous is the angle bracket; the pleasing symmetry of the <tag> opened and the </tag> closed. Angle brackets are key to the particular strength of XML, a uniform and universally agreed syntax. It is hardly surprising, then, that the angle bracket has a cult thing going. They're handy for logos and geeky puns (this column not excepted). Everybody loves angle brackets.

Advocates of simplification heaved a sigh of relief when XML locked down the basic syntax of markup. Its predecessor, SGML, had the scary ability to switch angle brackets out and do very weird things with the syntax. How nice that we can all agree.

Except that when you've got an angle bracket shaped hammer, everything starts looking like a nail. Surely nobody in their right mind can find pleasure in this XML-ification of a simple if ... then construct:

<prog:if test="a &gt; b" 
  xmlns:prog="http://myneatlang.com/">
  <prog:then>
    ...
  </prog:then>
  <prog:else>
    ...
  </prog:else>
</prog:if>

Despite the emergence of a few such programming languages with XML syntax, it's pretty clear that the angle bracket doesn't yield any of its much-vaunted "human readable" benefits here. The exception proving my rule, of course, is XSLT -- but you won't find anyone claiming that the most attractive thing about XSLT is its syntax!

Little languages

XML in a Nutshell

Related Reading

XML in a Nutshell
A Desktop Quick Reference
By Elliotte Rusty Harold, W. Scott Means

Even XSLT didn't go all the way with XML syntax of course, introducing the first non-XML syntax into the XML canon: XPath. (Yes, I omit DTDs here, there's been a contract out on them ever since XML 1.0 was published.) It's pretty obvious that expanding even modest XPath expressions into an XML syntax would lead to unmanageable stylesheets. XPath's filesystem-like metaphor for navigating an XML document works pretty well. Of course, an XML syntax for XPath has been argued for, the main reason being that XML processing machinery like DOM would be able to process XPath too -- the hammer and nail argument again.

This is perhaps a not unreasonable position: for every new syntax we introduce, the situation worsens in two ways. First, we need to write code to process the new notation. Secondly, the user of the technology needs to learn the new notation. Such arguments are not overwhelming, however. The purpose of the terser XPath notation is really to help the user in the first place. So writing a small amount of code to process it can be traded off happily against the benefits to the user. Further, the pain of learning a new notation is only partially linked to the syntax. Both the verbose XML syntax I linked above and the W3C recommended XPath syntax embody the same notions: the user must learn what XPath means (dare I say the s-word?) in order to use it effectively. It's not hard to see that a verbose XML syntax could actually obscure the easy acquisition of a language's semantics.

Other W3C work has recognized this pragmatic use of little languages embedded in XML. Perhaps the best example of this is Scalable Vector Graphics, which has constructs like this:

<path d="M9.777,9.958c1.306,0,1.529,1.601,1.529,2.752c0,
0.997-0.224,2.625-1.529,2.639V9.958z 
M6.913,17.905h2.499c1.741,0,4.816-0.449,
4.816-5.334c0-3.201-1.757-5.251-5.014-5.251H6.913v10.586z"/>

The contents of the d attribute describe the path of a line in a diagram. At the time of SVG's development, some objected that the path information was not easily processable with the XSLT hammer, as it wasn't in XML syntax. With hindsight I think we can all be very grateful that the SVG Working Group did not yield to the pressure to place all paths in XML armor.

Complete alternatives

Comment on this article Are you an angle bracket loyalist? Share your comments and tips about making XML more usable.
Post your comments

Eschewing the mere inclusion of small sections of non-XML, several people have chosen to go all the way with alternative syntaxes for XML. As I mentioned above, SGML had no hangups about leaving the house without wearing angle brackets, and one early alternative XML syntax owes much to its SGML heritage. PYX, developed by Sean McGrath, is a line-based notation for XML, which makes processing easy with regular expressions and line-oriented tools such as sed and grep. PYX borrows much from SGML's ESIS format.

PYX's main utility is in the processing of XML. There have been no serious challenges to the notion of the XML 1.0 syntax for purposes of interchange. It's generally at the processing or the document creation stage that alternative notations have their value: either as in PYX's case, to take advantage of existing infrastructure, or to make life easier and less error-prone for humans.

The simplicity or otherwise of XML schema languages has been a hot topic this year, and work in the area has yielded another instance of alternative syntax. RELAX NG ("relaxing") is the union of RELAX and TREX -- XML schema languages created by Murata Makoto and James Clark respectively -- now being developed under the aegis of an OASIS technical committee. RELAX NG offers a simpler approach than the W3C's XML Schema, though both technologies use an XML syntax (in fairness, RELAX NG also has a narrower scope than W3C XML Schema, aiming as it does at document validation only).

Not satisfied with RELAX NG's existing simplicity, James Clark recently posted an experimental alternative syntax for the schema language. Clark states that his main motivation was to improve readability of schemas. The new syntax uses familiar constructs such as p1 | p2 to replace <choice> p1 p2 </choice>, and is at its most valuable in complex type declarations. In one example in his document, a 14 line definition is reduced to two lines, and without introducing the kind of terse obscurity the Perl programming language is famous for.

Tim Berners-Lee has also been investigating non-XML syntaxes. It is often said that one of the main obstacles to the greater success of RDF over the last two years has been its syntax. It's certainly true that if you want to say something like "This person's name is Fred", it gets a bit painful as you have to write something like

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:p="http://www.example.org/personal_details#"
  <rdf:Description about="http://www.example.org/people#fred">
    <p:GivenName>Fred</p:GivenName>
  </rdf:Description>
</rdf:RDF>

The W3C has a strong culture of discussion and development using IRC, which is line-based. This seems to have been a contributing factor in the development of "Notation 3" (N3), a line-based syntax for RDF. In N3 the above example might look more like:

@prefix p: <http://www.example.org/personal_details#> .
<http://www.example.org/people#fred> p:GivenName "Fred" .

In his design note on N3, Berners-Lee describes it as "an academic exercise in language designed for a human-readable and scribblable language". As one of the obstacles to the deployment of widespread metadata is getting people to write it in the first place, it may be that a readily "scribblable" syntax can help.

Projection

Also in <taglines/>

XML 2004: After Declaring Victory, What's Next?

An Old New Thing

Moving On, But Not So Far

XML at Five

Whither Web Services?

Clark's RELAX NG syntax and N3 both have explicit translations to their XML representations and are intended for use before document interchange ever takes place. One problem that often faces XML programmers is being faced with embedded non-XML syntax actually at the interchange stage itself. It's this difficulty that gives rise to complaints about the non-XML nature of SVG's paths, for instance.

From a design point of view, it just seems messy to have to parse twice: once for XML, and then once for a little language embedded in attribute or text content. Life would be easier if everything could appear as SAX events or DOM nodes. Rather than forcing XML syntax to a silly degree, Simon St.Laurent has come up with an interesting solution to this problem.

In his Regular Fragmentations work, St.Laurent has created a SAX filter (code that can insert or delete events into the parsing stream) that performs mappings from regular expressions into XML, projecting an XML structure onto textual content. For example, a date written <date>2001-08-29</date> could be translated into <date><year>2001</year><month>08</month><day>29</day></date> for the processing application. Since this processing happens as part of the parsing process, the expanded form is never seen in the serialized XML. This technique has great potential for simplifying XML processing, as it changes everything into a nail just in time for us to bash it with our XML hammer.

Conclusion

XML's syntax is its strongest asset. That doesn't mean, however, that we have to take the naïve approach to getting benefit from it and bludgeon everything into angle bracket armor. There are many times when data may not be ours to change or is simply better suited to a different syntax. Whether through little languages or translators, these alternative syntaxes can in fact strengthen XML by making it more usable and understandable.


Comment on this articleAre you an angle bracket loyalist? Share your comments and tips about making XML more usable.
(* You must be a
member of XML.com to use this feature.)
Comment on this Article


Titles Only Titles Only Oldest First
  • Non-XML Syntaxes
    2001-08-30 16:35:39 Kurt Cagle [Reply]

    The first impression that I had when seeing the SVG path notation was alarm, as it did seem to introduce some complexity in the development (such as, for instance, animating a particular point in an SVG object), yet after I had a chance to think about it I realized that this particular form actually did offer some of the best benefits, and it was something that could readily be parsed in SVG with just a shade more work than otherwise.


    This has got me to thinking about the other "small" languages that are being subtly introduced into the XML canon. Regular Expressions first appeared in the XSLT WD, but disappeared before the Recommendation became finalized. They have resurfaced in XML Schema, and personally I think that they make a great deal of sense to appear within XPath notation as well. Being able to write an XSL expression such as


    <xsl:apply-templates select="//phone[. ^= /^(?$areaCode(?[-\s]?\d3-\d4/]"/>


    would make for far more sophisticate matches than what currently exists (contains(.,$areaCode) would match as well, but there's no way that you could ascertain that the match was on the area code, the exchange or as a part of the sequence of the local code).


    The XML syntax proposal for XML Query is just plain ugly, and demonstrates quite well why working with alternative syntaxes have their place -- XPath does not in fact lend itself to being described in XML terms.


    Indeed, I think one of the more intriguing aspects of XML Query that should ultimately be incorporated into XSLT 2.0 is the notion of creating XPath functions via an XML notation -- having a procedural notation for XSLT named templates, for instance, would make it easier to create sophisticated interrim XML that could be fed directly into an XPath expression without the ugly (and non-conformant) use of node-set() and multiple variables.


    -- Kurt Cagle
    -- Author, XML Developers Handbook

  • XML, little languages, and the network effect
    2001-08-30 20:57:01 Michael Champion [Reply]

    Let's keep in mind just who brung us to the dance here: Metcalfe's Law (or the network effect) -- basically, XML is highly valued because it is so widely used ... which makes it more widely used, which makes it more valuable ...


    XML (the hard core that people use, not the tottering superstructure that all 2500 pages of W3C specs collectively define) is "good enough for gummint work" while being simple enough to easily use and implement. That's all ... until we factor in the network effect. Then everybody's gotta have it because everybody's gotta have it.


    So what's this got to do with Edd's argument? Basically, let's be very wary of throwing out the benefits of the network effect offered by XML syntax unless we leverage an even better one. For example, the filesystem path metaphor is so powerful and widely understood that XPath hit the ground running, i.e., it leveraged the "network effect" of hierarchical filesystems that everybody understood already to describe patterns in XML hierarchies. One might argue that regular expressions have a similar benefit because they are widely understood by programmers and supported by software tools. Fair enough.


    But if the value of the "little languages" is just that they are easier use and understand than XML, I part company with Edd. XML syntax may be overkill for many purposes that little languages are better suited for, but the XML grammar is getting hard-wired into people's brains, XML parsers and APIs are ubiquitous, etc. What's the value proposition for even thinking about writing a new grammar, parser, API, etc. for some task that XML is "good enough" for? Maybe XML is too verbose in a given appication (e.g., for SVG), or maybe its data model doesn't fits well (e.g. RDF), and in these cases it's pointless to "bludgeon
    everything into angle brackets." But for a lot of other things, the value of being able to manipulate stylesheets, schemas, metadata, etc. with the very same conceptual and software tools as one uses on the data can outweigh the overhead of using XML.


    I doubt if "XML is too hard to author in" leads to a widespread value proposition for little languages. XML is simple enough to be easily generated from about any linear or tree-like data structure, and most users want to author in a GUI environment rather than learning a new "little language" anyway.


    So, I guess I see the "trees" that Edd and Kurt do (XPath, SVG, RDF) but I'm not sure that there's a "forest" out there that warrants widespread application of a little language design pattern.