<?xml version="1.0" ?>
<!-- <?xml:stylesheet href="first-x.xsl" type="text/xsl" ?> -->
<?xml-stylesheet href="first-x.css" type="text/css" ?>

<article xmlns="http://www.xml.com/namespaces/first-x"
         xmlns:html="any-old-bollocks" >
<header><mainTitle>XML in XML</mainTitle>
<author>Tim Bray</author>
</header>
<body>
<div1>
<para>This is after all XML.com, and it's about time that we started
publishing in XML.  The subject of the story, naturally, enough, is how to go
about publishing and browsing XML on the Web.
</para>
<sidenote><divTitle>First of a Series</divTitle>
<para>The plan was that this story would cover XML in IE5, including
the base language, setting up the server, CSS, XSL, and the DOM.
Unfortunately, we had a hard deadline (IE5 goes public on March 18th), and
when it arrived, I'd invested so much time in learning XSL, without getting
anything based on the public drafts working in IE5, that it occupied all the
time we'd budgeted for both that and the DOM.
We'll keep struggling with XSL until we get something that actually works and
plays by the rules.
Following that, we'll go on and do some DOM coverage.
</para>
</sidenote>

<para>The event that motivates publishing this article at this time is, of
course, the arrival of Microsoft 
<html:a href="http://www.microsoft.com/windows/ie/default.htm">Internet Explorer Release 5</html:a>.
While we will try out one or two things in the bleeding-edge pre-alpha
Netscape "Gecko" code, the main focus of this article is publishing and
browing XML in a standards-compliant way using IE5.
Of course, when there is actually some sort of real released product from
Netscape, we'll publish an even more interesting article - how to publish and
browse XML in an interoperable way.
</para>
</div1>
<div1><divTitle>How to do It</divTitle>
<para>There are a bunch of different ways to deliver XML over the Web.  
The first, and most obvious, is to give up and not try; after all, it's going
to be years, probably, before most people will have XML-capable browsers on
their desktops.
So you could just take the XML and turn it into HTML and 
deliver that.  (And in fact, if you're not adventurous, that version of this
article may be what you're now reading).</para>
<para>A second option is to just go ahead and deliver XML, as-is, don't
sweat stylesheets or anything, and just see what
happens.  With IE5, this turns out to be not as useless as you might
think.</para>
<para>The <emph>right</emph> thing to do is to send the XML to a browser with
a stylesheet, today in 
in <html:A href="http://www.w3.org/TR/REC-css2">CSS</html:A> and before too
long, in <html:a href="http://www.w3.org/TR/WD-xsl">XSL</html:a>.
This has a bunch of advantages:
<html:ul>
<html:li>it cuts down on the amount of data you have to transmit</html:li>
<html:li>it offloads a lot of the formatting work to the browser that's
coasting along on your under-worked Pentium desktop</html:li>
<html:li>it allows you to do cool things with the DOM.</html:li>
</html:ul>
</para>

<sidenote><divTitle>Setting Your Server Up for XML</divTitle>
<para>
If you are running Apache 1.3.4, you are in luck, since the mime type for xml files is already configured. If you are running an earlier version of Apache and you have access to the server configuration:</para>

<para>Edit the mime.types file and add the mime type by adding the line:</para>
<html:pre>     text/xml     xml</html:pre>
<para>then restart the Apache server.</para>

<para>If you don't have access to your server configuration, or don't want to mess with the configuration files, ask your system administrator to add the mime type:</para>
<html:pre>     text/xml     xml</html:pre>

<para>Other servers have slightly different methods for adding mime types.</para>
</sidenote>
</div1>
<div1><divTitle>But First, the Server</divTitle>
<para>Before you can worry about browing XML, you have to find a server (or
set up your own) that serves it properly. 
We've enclosed a note from one of XML.com's webmasters that
explains how you might go about doing that.</para>
<icon><html:img src="good.gif"></html:img></icon>
<para>
If you just want to read the XML out of a file (as I'm doing right now while
writing this article) things are a lot easier; when IE5 opens a file whose
name ends with ".xml", it assumes that it's going to contain XML and does the
right things.
</para>
</div1>
<div1><divTitle>Writing Your Document</divTitle>
<para>At this point, now that you're ready to serve, you need something to
serve. 
For this document, just to keep things clear, I invented the tags as I went
along, choosing reasonable-sounding names, and didn't bother
with a DTD.
In many cases, you can use someone else's pre-cooked DTD; one good 
candidate would be the 
<html:a href="http://www.w3.org/TR/WD-html-in-xml/">HTML-in-XML</html:a>
currently under development  at the W3C.</para>
<para>Whether you've got a DTD or whether you're just making it up as you go
along, you're going to need something to type it in with.  
The really basic approach is your usual text editor; that's what I do, except
for my usual text editor is GNU emacs, which isn't basic at all.
Emacs is really a tool for the hard-core geeks; you'd probably be better off
having a look through 
<html:a href="http://www.xml.com/xml/pub/pt/Authoring">XML.com's list of
authoring tools</html:a>.</para>
<div2><divTitle>Re-using HTML Tags</divTitle>
<para>Of course, you wouldn't want to invent all your own tags.
If you need a list, or a hypertext reference, HTML has those already built in,
and with the magic of <html:a href="http://www.w3.org/TR/REC-xml-names">XML
namespaces</html:a>, you should be able to use those HTML elements.
</para>
<para>Here's how it ought to work, in theory:</para>
<html:ol><html:li>Declare a namespace prefix and bind it to the official
namespace for HTML.</html:li>
<html:li>Use the names of HTML elements in your document, but attach that
prefix you declared.</html:li>
</html:ol>
<para>For example:</para>
<eg><html:pre>&lt;<no-op/>start xmlns:H="http://official-namespace-of-HTML">
 &lt;<no-op/>H:ol>
  &lt;<no-op/>H:li>Declare a namespace prefix and bind it to the official
    namespace for HTML.&lt;<no-op/>/H:li>
  &lt;<no-op/>H:li>Use the names of HTML elements in your document, but
    attach that prefix you declared.&lt;<no-op/>/H:li>
 &lt;<no-op/>/H:ol>
&lt;<no-op/>/start></html:pre></eg>
<icon><html:img src="good.gif"></html:img></icon>
<para>This kind of works, which is a little surprising.
Since the W3C hasn't gotten around to declaring an official namespace
name for HTML, the IE team has a tough problem in figuring out how to follow
the rules.
If it didn't work, you wouldn't see all the nice bullet-lists and
hyperlinks in the XML version of this document.</para>
<icon><html:img src="bug.gif"></html:img></icon>
<para>What you have to do is declare a namespace prefix, and that namespace
prefix 
has to be <emph>html</emph> - no other string will work!
You have to declare it, but you don't have to map it to any namespace name in
particular (do a "view source" on this page to see what I mean).
This is a huge violation of the essence of the namespace spec,
which would suggest that Microsoft somehow Just
Doesn't Get It about namespaces, except for we know that they do.
Puzzling.</para>
</div2>
<div2><divTitle>Linking to Stylesheets</divTitle>
<para>If you want to use stylesheets, you'll have to tell the browser.
The way to do that is to put a "stylesheet linking PI" at the top of your
document; here's an example:</para>

<eg>&lt;<no-op/>?xml-stylesheet href="first-x.css" type="text/css" ?></eg>

<icon><html:img src="bad.gif"></html:img></icon>
<para>Which leads us to the first nasty bug.  That little fragment is supposed
to begin with "&lt;<no-op/>?xml-stylesheet...", but in all the IE5 examples
I've seen, it begins with "&lt;<no-op/>?xml:stylesheet...", a now-obsolete
version that grossly violates the "Namespaces in XML" specification.
</para>
<para>
When I was preparing the example just above, I ran across another really
ugly problem. 
I wanted to show the stylesheet linking PI, but 
I couldn't just cut-n-paste it in as-is, because it
has a "&amp;lt;" character, which you can't have in XML text.  So I "escaped"
it using the standard 
built-in XML (and HTML) "&amp;amp;lt;" technique:</para>

<eg>&amp;<no-op/>lt;?xml-stylesheet href="first-x.css" type="text/css" ?></eg>
<icon><html:img src="bug.gif"></html:img></icon>
<para>Unfortunately, this made the whole example vanish!
It seems that IE gets confused somehow and sees the "&amp;<no-op/>lt;" as a
"&lt;" unless it's followed by a space, 
and starts parsing away.
After pondering this one for quite a bit, I ended up with the
following:</para>
<eg><![CDATA[&lt;<no-op/>?xml-stylesheet href="first-x.css" type="text/css" ?>]]></eg>
<para>The trick is that the empty "&lt;<no-op/>no-op />" element keeps IE from getting
confused.</para>
<icon><html:img src="bug.gif"></html:img></icon>
<para>Another trick would be to use "CDATA Sections" for examples like this.
But they seem pretty well completely broken in IE5 as well; it complains
about undeclared namespace prefixes and so on.</para>
<para>Sigh, release 1.0 of anything is always exciting, even when it's called
release 5.0.
</para>
</div2>

</div1>
<div1><divTitle>The Wacky World of Stylesheets</divTitle>
<div2><divTitle>The "Default Stylesheet"</divTitle>
<icon><html:img src="good.gif"></html:img></icon>
<para>If you load an XML document into IE5 with no stylesheet at all, you get
a nice tree-structured display (<html:a href="first-x-nocss.xml">like this</html:a>) with little +/- icons that you can click to
hide subtrees.  
I've actually started using this quite a bit to have a quick look at XML
documents that people send me; it's a good way to get a feeling for the
content and structure of some arbitrary XML.</para>
</div2>
<div2><divTitle>The Joy of CSS</divTitle>
<para>At this point in history, there is only one official, approved, stable,
production-quality standard for stylesheets, and it's named Cascading
Style Sheets, or CSS for short. 
<html:a href="http://www.w3.org/TR/REC-css1">CSS 1</html:a> has been
around since December 1996, and 
<html:a href="http://www.w3.org/TR/REC-css2">CSS 2</html:a> since May 1998.
</para>
<para>I don't really have the time and space to do a full investigation on CSS
compliance, but I don't need to, because my colleagues on the
<html:a href="http://www.webstandards.org">Web Standards Project</html:a> are
hard at work on this even as I write.</para>
<icon><html:img src="good.gif"></html:img></icon>
<para>In general I found the IE's CSS handling pretty good; i.e. everything I
tried worked more or less first time.  It's probably worth your while to grab
the CSS stylesheet that's being used here and have a look at it; it's not
rocket science, but it does illustrate a few tricks that I think will
be useful for a lot of people.</para>

<para>In particular, I'm fond of the <html:i>float</html:i> technique; in 
the XML+CSS form of this article, the sidebar and examples and little
good/bad/bug graphics are done with
CSS floats; previously, you would have had to use &lt;<no-op/>TABLE> kludges to
achieve this kind of effect.</para>

<icon><html:img src="bug.gif"></html:img></icon>
<para>But I have to end on kind of a sour note.
We may be moving the paperless office, but a lot of us still need to print
quite a few of our documents.
With XML+CSS, IE5 can't; that is to say, when you print, you get an
unformatted dump.
So near, and yet so far.</para>
</div2>
<div2><divTitle>A Word on Interoperability</divTitle>
<para>I wondered whether the XML+CSS display would work in the
pre-pre-pre-release of Mozilla.
Since I hadn't downloaded that in a few weeks, I went 
<html:a href="http://www.mozilla.org">over there</html:a> and 
pulled down a recent build, ignoring all the blood-curdling warnings about 
using this untested and pre-cooked software.
Good news!  It worked not too badly at all, first time out.
Mozilla is a <emph>lot</emph> pickier about margins and so on, and does
the floats a little bit differently from IE (I'm not enough of a CSS scholar
to say which is right) - but we are looking at two pieces of software with
strongly converging behavior.
Maybe there's hope for standards yet.</para>
</div2>
<div2><divTitle>The XSL Conundrum</divTitle>
<para>It is perfectly crystal-clear that Microsoft is un-enthused about CSS.
The Microsofties who helped us out with this article kept reminding us that we
should show off formatting with XSL.
And in fact, if you have links both to a CSS and an XSL stylesheet, IE picks
the XSL version.
Just to review, 
XSL - Extensible Stylesheet Language - is a W3C work-in-progress which is
scheduled for completion sometime later in 1999.
It comes in two parts - a "transformation language" used for preparing
documents for display, and a "formatting object set" that is used for actual
visual styling.</para>
<para>At this point in history, several groups (including Microsoft) have
implemented a snapshot of the transformation language, but no-one has got the formatting
part going.
What Microsoft would like us to do, apparently, is to use the XSL
transformation engine to turn XML into HTML before displaying it.
</para>
<icon><html:img src="bad.gif"></html:img></icon>
<para>This is going to cause problems. 
For example, in order to write this article, I needed to teach myself
XSL, so I went and looked at the XSL Working Draft - it's called a "Working
Draft" because it's in-progress and might change; if I may quote from the
introduction:</para>
<big-quote>[this] is a draft document and may be updated, replaced, or obsoleted by other
documents at any time. The XSL Working Group will not allow early
implementation to constrain its ability to make changes to this specification
prior to final release. It is inappropriate to use W3C Working Drafts as
reference material or to cite them as other than "work in
progress". </big-quote>
<para>
It quickly became obvious that the Microsoft XSL examples contain many 
things that aren't in the XSL Working Draft.
Is this because they, as members of the XSL Working Group, know about things
that will be in a soon-to-arrive draft of XSL?
I don't know.
Is it safe to use them?
I don't know.
</para>
<para>The bottom line was that when the deadline for this article rolled
around, I didn't have XSL working. 
This is a pity, because XSL has a couple of tremendously attractive 
properties.  Perhaps the most important is that it will run both in the
browser and in the server; so you can send XML+XSL to XSL-capable browsers, 
and for the rest, run the same code on the server to generate HTML.</para>
<para>But in XML.com, we try hard to do things by the rules, so you won't see
any XSL in production here until we can figure out how to use it by the book -
we <html:i>assume</html:i> that IE5 will be able to do this.</para>
</div2>
<div2><divTitle>There's More than One Way to Do It</divTitle>
<para>After our little expedition force had washed up on the rocks of XSL, we
were left with a document, in XML only, that couldn't display in the
real-world (rev 4 and behind) browsers that real people have on their real
desktops.</para>
<para>But we were undismayed, because we opened our grease-stained tool box,
and whipped out the all-purpose tool.
127 lines of perl later (using, of course, the brand-new "XML::Parser" module),
we had a less-pretty form of the article (probably 
what some of you are reading), in HTML generated automatically from the XML.
It'll be interesting to note, after we get the job done with XSL, whether it
turns out to be more or less code than in perl.
Also it'll be interesting to make a judgement on which is more maintainable
and flexible.
</para>
</div2>
</div1>
<div1>
<divTitle>Getting to Grips with IE5</divTitle>
<para>First, you have to install it.
Microsoft kindly sent us a CD before the rest of the world got it (thanks,
Dave); and the installation on my NT box was pretty pain-free. 
Following the advice of my local expert, I uninstalled the IE5 beta first, and
rebooted just to be sure. 
Firing up the CD reveals that it contains not just IE but NT Service Pack 4,
which I hadn't installed.
So, installing the service pack and taking all the defaults, you're
looking at 3 or so reboots, and a lot of waiting while watching polite
messages from Windows about how it's optimizing your system.</para>
<icon><html:img src="bad.gif"></html:img></icon>
<para>I thought it was interesting that the CD-ROM contains 198 MB of data;
you probably don't need to get that much to get yourself an IE5,
but it's safe to say that IE5 is going to be a great big honkin' download.
Microsoft has in recent times been making their browser updates available on
CD-ROM for almost nothing, basically the cost of duplication and shipping; my
experience suggests that this is probably the best way to go about getting
IE5.</para>

<para>This article isn't a Web browser review, but IE5 seems like a pretty
nice Web browser, except the XML-related problems. 
It runs fast (faster than IE4, <emph>much</emph> faster than
any version of Communicator, about the same as the Mozilla pre-releases)
and looks good.  As before, the smooth scrolling and
cleaner 
screen are improvements over Netscape.  As before, it insists, when you create
a new browser window, at starting in the same page that was active, rather
than your home page.  
It does one thing better than any previous version of IE - namely, when you
use the "back" button, it does a fine job of coming back to a point in the
page not too distant from what you left behind.</para>
<icon><html:img src="good.gif"></html:img></icon>
<para>IE's error-handling seems exactly per the spec, which is delightful.
I've been giving speeches for a couple of years now telling people that
XML-style error handling in the browser wouldn't really change work patterns;
you'd bash out your XML, and when it displayed in the browser, you could ship
it.  I'm glad to have discovered that I've been telling the truth.
At least,
that's how this article got created - whenever I stupidly put a tag in the
wrong place or forgot a quotation mark, IE politely but firmly told me
where the problem was.</para>
<para>IE's error handling becomes very irritating, of course, when there's
not really an error - for example, when the browser refuses to bypass escaped
"&lt;" characters.
But we can assume that Microsoft will work on fixing that.</para>
</div1>
<div1><divTitle>In Conclusion</divTitle>
<para>Is the glass half-empty or half-full?
It's too early to call; rendering XML with CSS is nice (and will be even nicer
once IE 5.x fixes a few more bugs), but the real value-add of XML in the
browser isn't so much displaying it as processing it right there in the browser.
For that, you need the DOM; if IE5 turns out to have a nice clean usable DOM,
that will make up for a lot of little awkwardness in the parser. 
If not, this will look like a (huge amount of) wasted effort.
Stay tuned.</para>
</div1>
</body>
</article>
