Look Ma, No Tags
by Kendall Grant Clark
|
Pages: 1, 2
I can also represent this structure in plain text according to some informal Unix conventions about configuration files:
PRIVMSG newUri ^http://.*
PRIVMSG deleteUri ^delete.*
PRIVMSG randomUri ^random.*
For plain text, that's about as little space overhead as one could want, but it's also inflexible: data elements cannot have spaces, cannot span more than one line, and so on. But in some applications, it's all you really need, and it's trivial to serialize to or deserialize from disk, which is nice.
The XML space overhead is of course considerably higher (we'll conveniently ignore the XML setup boilerplate), and one rendering might look something like
<bindings>
<binding>
<ircEvent>PRIVMSG</ircEvent>
<method>newUri</method>
<regex>^http://.*</regex>
</binding>
<binding>
<ircEvent>PRIVMSG</ircEvent>
<method>deleteUri</method>
<regex>^delete.*</regex>
</binding>
<binding>
<ircEvent>PRIVMSG</ircEvent>
<method>randomUri</method>
<regex>^random.*</regex>
</binding>
</bindings>
The other obvious rendering -- which might not work at all, given the restrictions on XML attribute names -- would look something like
<bindings>
<binding ircEvent="PRIVMSG" method="newUri" regex="^http://.*" />
...
</bindings>
I'm too lazy to count the space overhead for either XML rendering, but it's definitely more than the literal Python structure or the Unixy config file. Of course, you get more from XML, since now the data can be called "self-describing", though in simple cases that tends to be of limited utility. With judicious use of comments, the XML and plain text versions can be equally self-describing.
Finally, let's look at the same structure, a list of tuples, represented by YAML:
---
-
- PRIVMSG
- newUri
- '^http://.*'
-
- PRIVMSG
- deleteUri
- ^delete.*
-
- PRIVMSG
- randomUri
- ^random.*
|
Related Reading
|
That's 12 visible characters of overhead, plus a good number of newlines and spaces. (Note that YAML uses three dashes, ---, to separate documents within a file or stream, which it considers a "series of disjoint directed graphs, each having a single root" (YAML spec).) From the point of view of absolute space efficiency, YAML is not a radical improvement over XML or literal Python. But if you're really interested in absolute space savings, you're probably ready to sacrifice human readability anyway. What I mean by calling YAML "lightweight" is that from the standpoint of visual perspicuity and input concision, the YAML version is almost as lightweight as the plain text, and more lightweight than either XML or literal Python. In short, YAML is as easy to read, if not more so, and considerably easier to type by hand than XML. And that counts for a great deal in many kinds of application, especially stuff like configuration files and the like.Native Processing Model
As for YAML's other influences, section 1.2 of the specification generously exhibits them. Python fans will be happy to note that it uses whitespace as a block delimiter. It also steals ideas from MIME, HTML, XML and SOAP, including aliasing, application-specific types, and a namespace mechanism which is part Java package naming and part XML URI-based namespace naming. But perhaps the biggest influence on YAML is Perl -- which I, as a Python devotee, had to learn not to hold against it! -- especially in the way YAML conceptualizes data structures and types, which it distinguishes into scalars, like integers and strings, and collections, like hashes and arrays.
In addition to being more lightweight for reading and writing than XML, YAML has a different processing model, too. As is well known, XML's nested elements and attributes most fluently describe tree-shaped structures. YAML, by contrast, hews very closely to the data processing models of programming languages like Perl, Python, and Java, freely mixing sequence, mapping, and scalar types. As a result, YAML serialization fits typical programming language constructs more closely than XML, requiring neither mapping conventions nor DOM or DOM-like adaptations.
In short, the two leading selling points of YAML over XML are that it's more lightweight, and that it uses native processing models and data structures. The most serious YAML detractions are that it isn't XML, and it isn't nearly as ubiquitous as XML; though YAML is very well supported in Perl, the support in Python, Java, and Ruby is maturing, and there are rumors of a forthcoming libyaml in C, too. It bears repeating that ubiquity of tool support is not an absolute value; it is context-dependent and goal-specific. You may be able to sacrifice it for the sake of using YAML and securing its virtues, depending on what you need to do and where you need to do it.
|
Also in XML-Deviant | |
A Profitable Coexistence?
There is a lot to YAML. The specification fits in one HTML document, but it is neither short nor simplistic. For example, if you're interested in YAML but circumstances prevent you from moving away from XML all at once or altogether, you might want to look at YAXML, which is the YAML conceptual model with XML's familiar syntax bolted on.
If you have been late to adopt XML as a matter of policy, or if you have been having second thoughts about its costs for your projects, YAML is definitely worth a long, serious look. You may well find that it imposes less overhead, both on the people who produce it by hand, and the people who program computers to produce and consume it. And even if YAML never becomes more than a niche tool, if you happen to occupy that niche you'll be happy to have it around.
- Small Typo
2002-07-25 06:21:39 Clark Evans - Small Typo
2002-07-25 14:27:54 Kendall Clark - "restrictions on XML attribute content"?
2002-07-25 05:52:27 user2048 user2048 - "restrictions on XML attribute content"?
2002-07-25 07:24:59 Edd Dumbill
