Forming Opinions, Part 2

April 27, 2005

Micah Dubinko

"But digital technologies enable a different kind of tinkering — with abstract ideas, though in concrete form." -- Lawrence Lessig, Free Culture

Previously, this column examined Web Forms 2.0, or WF2, a technical report recently presented to the W3C by Opera and the Mozilla Foundation. WF2 seeks to extend forms-related aspects of SGML-based HTML 4.01 and XHTML 1.x.

Effectively extending an existing vocabulary is always a challenge; keeping track of which pieces are free to change versus which pieces need to be locked down. The HTML 4.01 Recommendation, in particular the forms chapter, is not a paragon of specification rigor. It leaves several opportunities, some by design and some by accident, for little tweaks and adjustments.

To resume the discussion, we'll continue our look inside WF2 where we left off in section 2. One of my favorite parts of this section consists of all the little tweaks suggested to classic forms as we know them. Anyone who has worked with form-scripting has probably run into one of these limitations. These include:

1. Allowing empty content for form, fieldset, legend, select, and optgroup. In cases where script will insert values, the initial state is best left empty, but doing so currently triggers validation errors.

2. Controls no longer need to be nested underneath a form element, and can instead use an IDREF to point to one. With this change, a form can be declared in the head further separating concerns.

3. Final word on some debates about how specific corner cases should behave, specifically for radio group or single-select with no initial selections.

4. Mentioning platform behavior for 'label' and allowing nesting of optgroup.

5. Allowing nested form elements (though with no semantic significance implied by the nesting).

At first this seems unusual, but the need can easily arise in portal situations, where small, self-contained blocks of markup are needed, and aren't allowed to make assumptions about surrounding content.

Another useful resource is description of common features that are widely implemented in IE and other browsers, but not formally documented as a standard anywhere. In this camp is the autocomplete attribute. There are likely others as well, but the arrangement of the document makes it difficult to see which features fall into this category.

Newly-added attributes, including required, were covered previously. In the same camp are min, max, maxlength, step, and pattern for regular expressions, all of which work pretty much like you'd expect them to. Also welcome are wider applicability of disabled and readonly attributes (though I'd like to see readonly go even farther — to cover radio buttons, checkboxes, and lists).

WF2 defines several new values for existing attributes. One of the areas where HTML is loosely defined is the type attribute, as in <input type="text">. The specification fails to mention what happens if some new value is present in that attribute. (WF2 deals extensively with this topic.) Most browsers, upon encountering an unknown attribute value for an input type, will fall back to a standard text control, so that as long as fallback users don't mind manually typing in things like "2008-12-31T18:36:00", the form will continue to work. This is actually a fairly clever hack that provides some of the same benefits as XForms Schema-driven design, without an xsd namespace in sight.

One more nice touch in this section is an output control that displays form values. In a script-based solution it's not much different than just having a span or other element with a known id, though with the addition of external form data, it becomes more useful.

Since I promised to highlight the parts of WF2 I like, this week's installment will skip section 3 on repeating sections, and move on to DOM features.

DOM Gravy

The recurring theme of WF2 is dependency on script. In some cases, like calculations, script is necessary to perform basic functions. In other cases, script is needed transitionally to replicate new things defined by WF2. But either way, authors have a genuine need for DOM interfaces. DOM access is important. Script isn't inherently evil or even inherently inaccessible, only specific uses of it are. When different browser platforms each use different DOM interfaces, a developer's job of using script in a non-evil way becomes that much more difficult.

So there is value in hammering out common ground. Section 4, and later section 7, define a scripting interface that appears to be a straightforward extrapolation of classic forms, including new script-bearing attributes hard-wired into the language for each new event. But I think the W3C would have a hard time swallowing it.

The situation with events is much like a smaller-scale replay of the situation with forms. The W3C has collectively identified many limitations in the inline attribute syntax style, and is pursuing a different course, based on XML Events. In the long term, the W3C is right. In the short term, though, I think there are some interesting possibilities for transitional strategies.

The original forms DOM took several years of work, and not entirely within the W3C, before a significant amount of cross-browser scripting became feasible. WF2 is clearly attempting to accelerate this process, though some of the more advanced DOM functions introduced in WF2, useful though they are, stray pretty far from what conventional HTML scripters will be comfortable with.


The original specs for classic forms date back to the early 1990's, well before XML became a Recommendation in 1998. Increasingly, XML is used as an interoperability layer or an on-the-wire format between different systems. As discussed before, XML and forms share a deep connection, so a natural impulse is to extend older forms systems with XML. WF2 does this in sections 5 and 6, which among other things define a new content type "application/x-www-form+xml" and a specific syntax for it. An example of the submitted XML basically looks like this:

XML data submitted from WF2

<submission xmlns=...>

  <field name="fname" index="0">Moe</field>

  <file name="file" index="0" filename="todo.txt" type="text/plain;charset=UTF-8">



  <field name="dt" index="0">2008-12-31T18:36:00</field>


Note that WF2 intentionally doesn't define a specific namespace string that would be used here, in order to discourage premature implementation. Moving in an XML direction is beneficial, even with the obvious observation that what we have is Yet Another XML vocabulary. Any useful back-end processing of data submitted this way would require a custom transformation step, one that would be similar in structure to code for converting urlencoded or multipart submission data into an XML vocabulary of choice. In contrast, other systems like Microsoft InfoPath work with arbitrary XML and can avoid an extra transformation step during the data integration phase of a forms project.

WF2 also defines similar ways to bring XML data into the form as data or list choices, which is even more useful as it can help server developers steer around the area of templating madness. Existing formats like UBL, though, will still require pre- (and post-) processing.

So XML features makes this week's list of things I like in WF2, though just barely. More advanced features naturally drift away from the stated ideal of leveraging existing developer knowledge. This interesting relationship between the past and future of forms will be the focus of the next issue of XML-Deviant.

Births, Deaths, and Marriages

A slow week for announcements in the XML world.

Call for implementations: UBL input specifications - Developer's Preview

A preview of my own work on modeling UBL for form authors.

Documents and Data

Just use sed

Sound advice on spokespersonship

REST + XML namespaces + XSLT (oh my!)

More thoughts on Documents v. Databases