XML Prague, February 2017

March 10, 2017

xml prague conference

XML Prague: A practical conference, a meeting of minds, a growing community, a conference about XML in a beautiful and ancient city.

The XML Prague markup conference takes place each year (as you might guess) in the beautiful and ancient city of Prague. This year there were two hundred people milling around, eager to share experiences and to listen to others. Proceedings, including slides, are online at the conference site.

For some of us the conference began before it started: at a workshop for XProc, the XML pipeline language. There were only some ten people at the workshop: most agree that XProc is too hard to learn and use and desperately want to change that. Practical improvements and bug fixes to the to existing XProc language were discussed. Norm Walsh gave a summary of the workshop on the Friday as a main conference session.

For others the conference began with the pre-conference day. There were well-attended user group sessions for Schematron, for users of the oXygen XML editor and of the eXist-db forest store / database / application engine, a session on XSL-FO, CSS and Paged Output hosted by (and mainly about) Antenna House Formatter, hands-on sessions with speedata Publisher and with TXSTEP (scripting for the powerful and fast TUSTEP formatter), and examples of XProc in use today. The XML word includes both standards-based and more individualistic approaches to a wide variety of problems and it's good to see both approaches represented. Although the Schematron room was especially full, others had healthy audiences who were very actively engaged. If you go to XML Prague, I highly recommend the pre-conference day.

And then the main conference started, and we had two days of XML (Friday and Saturday) and, for many, a night of beer.

The Friday sessions started with Web-based XML authoring from George Bina of oXygen fame. If that wasn't enough to get people's attention, the Saxonica people demonstrated XSLT 3 and XPath 3 running in the Web browser using their JavaScript XSLT engine, Saxon-JS. John Lumley showed work he'd done on implementing an XPath parser in XSLT, so that they can support XPath outside XSLT and can do dynamic evaluation of XPath expressions, perhaps replacing the built-in browser XPath function with one written in this Century.

The Schematron user group sessions had been really full before the conference so Martin Middel's use of Schematron to give interactive feedback to users of an editor was pretty promising. I was afraid I'd rather let the side down with my own talk on improving text quality of OCR output by making automatic majority editions but it turned out there's a lot of interest in getting texts into XML systems.

Speaking of getting text into XML systems, the free online service data2check doesn't actually turn your data into money directly as the name might suggest, but anyone running a publishing pipeline with InDesign or Word documents will love a service to make sure all paragraphs have styles, that the styles are right, and more.

XML Prague takes place, of course in Europe. It's one of the more practical conferences, and also one of the more multilingual - even though the sessions are in English, the audience includes a lot of people involved in internationalization and multilingual work, and there's always a strong interest in this topic. So a talk on the use of ITS 2 (the Internationalization Tag Set) in OASIS XLIFF (the Localisation Interchange File Format) was a good match for the audience.

You can't have all this obviously practical stuff without some contrast, so Michael Kay gave us a talk about XML Projection. This is an optimization technique in which you make a subset of the input as early as possible in processing, reducing the amount of data to process. The idea came from Amélie Marian & Jérôme Siméon at the Very Large Data Base (VLDB) conference in 2003, and has been implemented in Saxon to speed up XQuery processing considerably in some cases, especially useful in conjunction with the processing of large documents that streaming allows.

If Michael Kay's paper on up-and-coming optimization techniques in Saxon looks to the future, Marcus Reichardt's talk on a subset of HTML 5.1 as an SGML DTD seems to build on an older heritage.

Speaking of heritage, the XML Prague Demo Jam was held as in previous years in a medieval monastery suitably equipped with beer. Jet-lag struck your reporter got no further than his hotel room on Hybernská street, a quiet and unpretentious Best Western!

On Saturday, those with hangovers might have faced a challenge, alleviated by the fact the conference sessions started just after 10am but exacerbated by a more theoretical talk on the X-Definition programming language. This looks more interesting than a single talk could show. See http://xdef.syntea.cz/tutorial/en/index.html for more information.

John Snelson gave us a talk about Relational and Semantic Views over Documents. MarkLogic people have given a number of talks about ways of offering SPARQL and now SQL views of XML and, although the software is proprietary, there are some very interesting ideas.

Michael Kay, editor of the XSLT specification, spoke about the status of XSLT and XQuery standardization. The languages, including XPath 3, are now very stable and introduce some exciting new features.

Steven Pemberton gave a second talk about his Invisible XML, which is a technique for creating XML from an arbitrary (context-free) parse tree, showing how his ideas have evolved. Since the primary purpose of a markup language is to help the computer to understand and process text reliably, not using explicit markup where the computer can clearly derive a correct internal representation seems very wise, as long as it is balanced with the necessary error checking.

Hans-Juergen Rennau has spoken about his FOXpath system for using an XPath-like language over file systems before (I think at Balisage). Now, he extends his system to support queries over remote resources such as github repositories. Then, as here, he was asked why he deviated from XPath syntax, for example using “\”instead of “/” for file-system steps. He said that the syntax was configurable.

Alejandro Bia described a PHP-based packaging system for installing bundles of software, combined with an example bundle for converting MarkDown text to TEI XML markup.

Keeping with the TEI theme, Gerrit Imsieke gave us a name, epischema, for a second or subsequent schema used to verify only one aspect of a document. The approach of using multiple special-purpose schemas consecutively is of course well-suited for use with XProc, of which Gerret is a strong proponent.

We then heard from George Bina and Dan Caprioara of Syncrosoft (makers of oXygen XML Editor) about a tool that converts CSS for print to XSL-FO so that you can use FO-based tools such as FOP. It's not surprising to those who have studied the two approaches that a CSS to FO conversion runs into problems, and it might be that their paper is more interesting from a learning point of view than anything else.

Finally, Liam Quin spoke again, this time about the merger that recently occurred between IDPF and W3C, and why he thought this was a good idea.

XML Prague has talks that are very focused on practical ideas and techniques; it's very different from the more philosophical bent of Balisage, and for sure there's room for both. XML Prague had over 200 people this year and continues to grow. Let's hope that in difficult international times both conferences continue to be strong.