XML.com: XML From the Inside Out
oreilly.comSafari Bookshelf.Conferences.

advertisement

Migrating to XForms

November 01, 2006

In 2001, the W3C set out to create an XML standard for implementing user forms in XHTML by publishing the XForms 1.0 Working Draft. The purpose of XForms is to eventually replace existing HTML forms, which are limited in capability and notoriously difficult to develop in. If you are not familiar with XForms, or aren't convinced of their benefits, start off by checking out What are XForms.

In March of this year, the W3C announced the XForms 1.0 Second Edition Recommendation. In July, Mozilla announced Preview Release 0.6 of their XForms extension. It won't be long until browsers begin supporting XForms, and once this happens, they will be the prevalent and preferred method of user data collection on the internet. Until then, it's in our best interest to begin migrating our current XHTML forms to XForms so that we're ready once the new standard is mainstream.

Our goal here is to take an XHTML document containing one or more standard forms, convert the forms into XForms format while preserving all of the information, and generate a new XHTML document as a result. To achieve this, we will be using the PHP parser functions, which have been around since PHP 4 and have been used in many PHP APIs, such as Magpie (an RSS parser) and nuSOAP (a library for web services support).

Figure 1
Figure 1. XForms Parser

Figure 1 is an overview of how the system will work. Essentially, there are three main phases (grey). In Phase 1, we prepare the input file for parsing and split it into several segments. In Phase 2, we actually pass the data through the parser. Note that the only segment of the input file that is actually parsed is the <body> tag (green). Because XForms require elements in both the <head> and <body> HTML, the parser will also append data to the contents of the <head> tag. This appended data is labeled "A" (orange). "B" represents the portion of the input XHTML that closes the <head> tag. Each phase will be explained separately.

Phase 1: Preparing the Input

As is evident in Figure 1, it is crucial that we split the input file into many segments so that we parse only the portion of the XHTML file that we need to, and so that we append the necessary XForms elements to the <head> tag. To accomplish this, we use two PHP functions: stripos() and substr(). The first function tells us the position of a string (needle) inside a larger string (haystack). We will pass the result we get from this function to the second function: substr(). As you might guess, substr() gives us a part (substring) of a larger string -- all we have to tell it is the start position and the substring's desired length.

Now that you understand what we're doing, you're probably wondering why we're doing it. Take a look at the code below, and you should get a clearer idea:

/*A*/
$instr = file_get_contents("inputform.html");
$pos["headstart"] = stripos($instr,"<head>");
$pos["headend"] = stripos($instr,"</head>");
$pos["bodystart"] = stripos($instr,"<body>");
$pos["bodyend"] = stripos($instr,"</body>")+7;

/*B*/
$input["top"] = substr($instr,0,$pos["headstart"]);
$input["head"] = substr($instr,$pos["headstart"],$pos["headend"]-$pos["headstart"]);
$input["middle"] = substr($instr,$pos["headend"],$pos["bodystart"]-$pos["headend"]);
$input["body"] = substr($instr,$pos["bodystart"],$pos["bodyend"]-$pos["bodystart"]);
$input["bottom"] = substr($instr,$pos["bodyend"]);

A: file_get_contents() fetches the contents of the input HTML (inputform.html) and stores it in the variable $instr (line 1). The next four lines call stripos() to get the positions where the <head> tag begins, the <head> tag ends, the <body> tag begins, and the <body> tag ends (respectively). We added "7" to the position of the end of the <body> tag so that the position is that of the first character after the <body> tag. To understand why we've made this exception, let's look at the second part of the code.

B: Here we call substr() and split the input into the five sections outlined in Figure 1. The first parameter passed to substr() is the input string (in this case, $instr), the second is the position of the first character of the substring that will be returned, and the third parameter is the length of the desired substring. We already have the right positions (the simple algebra used to verify this has been omitted), so we simply pass the positions we got in the previous four lines. We added "7" to the last position retrieved (i.e., the closing </body> tag) so that we include this closing tag inside the $input["body"] substring. We do this because this substring will be the one passed to the parser; we include the closing tag so that the substring runs through the parser without throwing an error.

Because the PHP parser is designed primarily for XML input, we will need to make some minor changes to the contents of the <body> tag (stored in $input["body"]). For example, the following three form tags would each throw a PHP parser error:

<input type="text" name="t" disabled />
<input type="checkbox" name="c" value="c1" checked />
<select multiple name="s">
<option value="1">One</option>
</select>

This happens because element attributes without set values are not allowed in XML. Namely: disabled, checked, and multiple. To avoid this, we will "trick" the parser by assigning null values for these element attributes so that the modified HTML look like this:

<input type="text" name="t" disabled="" />
<input type="checkbox" name="c" value="c1" checked="" />
<select multiple="" name="s">
<option value="1">One</option>
</select>

The following code accomplishes this task:

$fixatt = array("multiple","checked","disabled");
foreach ($fixatt as $a)
    $input["body"] = str_replace(" $a "," $a=\"\" ",$input["body"]);

str_replace() is another useful PHP function. It searches for a certain string (first parameter) inside a larger string (third parameter), and replaces it with a replacement string (second parameter). The function returns the new, modified string. Note that if you plan to extend this code to larger HTML files with mixed data, you should use the preg_replace() function instead because str_replace will not be selective enough in some cases. That is, if your HTML body contains any of the words in $fixatt, they will automatically have " ="" " appended to them. You can be more specific with preg_replace() since it uses regular expressions, thus allowing you to limit modifications to only those within <form> tags.

As we have successfully prepared the HTML for parsing, we can move on to the main phase: the parser.

Pages: 1, 2, 3

Next Pagearrow