Menu

Padded Downloads

June 29, 2005

John E. Simpson

The question would be almost Seinfeldian if it weren't so hopelessly nerdy: what's up with software descriptions, anyway? Did you ever notice how popular download sites A, B, and C always seem to provide exactly the same kind of information for a given software package? Are these sites stealing one another's content?

No. The reason all those descriptions look identical is that they are identical—they're packaged with the software itself.

Early Times: file_id.diz

Back in the days of 300 to 2400 baud bulletin board systems (BBSes) and hard drives measuring in the tens of megabytes, bandwidth and space for downloadable software were at a premium. The largest files might sit on the virtual shelves, languishing, for lack of downloaders, no matter how useful the software might be. Along came the .zip, .gzip, and .tar file compression formats, enabling not just the shrinking of files, but their bundling as well. Besides setup and other files needed to actually run the program, its .zip package frequently included a "readme" describing the program, its version, how to install it, and so on.

In the early 1990s one BBS vendor, Clark Systems, came up with a standard format for much of the information recorded in these "readmes." Standardizing the format meant that BBS software—particularly, Clark's own PCBoard—could expect it to be of a certain form, and thus handle all the standard readmes the same way. The format was called Download In Zip (or diz for short) and the standard required that each program package include one and only one diz file, named file_id.diz.

This format was barely a format, as readers of XML.com would understand the term. While the contents—program name, version number, operating system, and so on—were vaguely prescribed, a file_id.diz was essentially just a free-form text file. (And a little one, at that, with line length no more than 45 characters, about 10 lines long at most. Such concision in data may also be unfamiliar to readers of XML.com.)

Moving on: PAD Files

With greater bandwidth, and with the introduction of XML, the Association of Shareware Professionals (ASP) came up with a new standard for describing downloadable files in general, and software in particular. It's called PAD, for Portable Application Description.

The base PAD spec is itself an XML document—something like an XML Schema or DTD in intent. For instance, it establishes, for each "field" (element) in a PAD file, a name, a description, and a regular expression (which places limits on the field's data type and length). PAD file authors aren't restricted to using just the standard fields, either; ASP has also established standards for authors to develop their own extensions to PAD. (These extensions aren't anything goes mishmashes, by the way. ASP continues to validate them and control their identification, for example.)

So what does a PAD file look like? Here's a portion of the one that comes bundled with CryptaPix, a "graphics viewer/encryption program for Windows 95 to XP":

<?xml version="1.0" encoding="UTF-8" ?>
<XML_DIZ_INFO>
<MASTER_PAD_VERSION_INFO>
[elements describing the PAD standard followed by this file]
</MASTER_PAD_VERSION_INFO>
<Company_Info>
<Company_Name>Briggs Softworks</Company_Name>
<Address_1>14314 Cashel Forest Dr.</Address_1>
<Address_2 />
[other address-related fields]
<Company_WebSite_URL>http://www.briggsoft.com</Company_WebSite_URL>
<Contact_Info>
[author and other contact information]
</Contact_Info>
<Support_Info>
<Sales_Email>kbriggs@briggsoft.com</Sales_Email>
<Support_Email>kbriggs@briggsoft.com</Support_Email>
<General_Email>kbriggs@briggsoft.com</General_Email>
[phone/fax numbers]
</Support_Info>
</Company_Info>
<Program_Info>
<Program_Name>CryptaPix</Program_Name>
<Program_Version>2.24</Program_Version>
[release dates, other data about the program - OS, etc.]
<File_Info>
<Filename_Versioned>cpx32224.zip</Filename_Versioned>
<Filename_Previous>cpx32223.zip</Filename_Previous>
<Filename_Generic>cpx32.zip</Filename_Generic>
<Filename_Long />
[file sizes]
</File_Info>
<Expire_Info>
<Has_Expire_Info>Y</Has_Expire_Info>
<Expire_Count>30</Expire_Count>
<Expire_Based_On>Days</Expire_Based_On>
[date-based expiration info]
</Expire_Info>
<Program_Change_Info>updated order form</Program_Change_Info>
<Program_Specific_Category>Graphics</Program_Specific_Category>
<Program_Categories>graphics viewers encryption</Program_Categories>
<Program_System_Requirements />
<Includes_JAVA_VM>N</Includes_JAVA_VM>
<Includes_VB_Runtime>N</Includes_VB_Runtime>
<Includes_DirectX>N</Includes_DirectX>
</Program_Info>
<Program_Descriptions>
<English>
<Keywords>cryptapix, graphics, image, viewer, jpg, encryption, thumbnail, slideshow</Keywords>
<Char_Desc_45>JPG graphics viewer and encryption utility</Char_Desc_45>
[longer descriptions]
</English>
</Program_Descriptions>
<Web_Info>
<Application_URLs>
[where to get more info, find screenshots and icon, etc.]
</Application_URLs>
<Download_URLs>
<Primary_Download_URL>http://www.briggsoft.com/download/cpx32.exe</Primary_Download_URL>
<Secondary_Download_URL />
[other download sites]
</Download_URLs>
</Web_Info>
<Permissions>
<Distribution_Permissions>...</Distribution_Permissions>
<EULA />
</Permissions>
<ASP>
[information about vendor's ASP membership]
</ASP>
</XML_DIZ_INFO>

It's pretty easy to see what use a software download site could make of all this information. About the XML itself, just a few comments:

  • It's a pretty straightforward XML document, consisting of elements and text content only. One could quibble with this simplicity, but the document is undeniably simple to process.
  • The historic connection to the diz format is maintained in a couple of ways. First, the root is named XML_DIZ_INFO. Second, in the Program_Descriptions element, descriptions of various lengths may be supplied—starting with one no more than 45 characters long.
  • While many elements contain more or less free-form text, some (such as Has_Expire_Info and Includes_JAVA_VM) are constrained by the PAD specification to only certain values (Y or N in these two cases). The spec further constrains the text content of some elements to certain lengths: Company_Name, for instance, must be between 2 and 40 characters long. These constraints are imposed by RegEx elements in the PAD specification.
  • All fields required or allowable under the appropriate version of the PAD specification are included, even "blank" ones (in the form of empty elements—Address_2 and so on).
  • There's no provision for incorporating non-textual information in the PAD file—no BLOBs, or base64-encoded images, for instance. But the PAD spec makes it easy to refer to such non-textual information by the use of simple URLs. A download site can thus display screenshots directly, for example, or it can simply provide the links themselves.

PAD-processing Software

ASP itself provides software resources for PAD file authors (typically the vendors, of course) and for webmasters hosting downloads accompanied by PAD files. For webmasters, ASP offers the free PHP/MySQL-based PADKit—a codebase for running a download site, which may be built upon and extended or enhanced as desired. For authors of PAD files, ASP simplifies data entry with a Windows-based tool, PADGen.

But shareware authors don't all use (let alone develop for) Windows, of course. ASP offers links to some third-party suppliers of PAD editors and online editing services, which can be used by Mac developers or those who feel comfortable using Web-based forms (such as the one at Padfiles.net) for creating their PAD files.

A third option is the Java-based Gsoftpad editor, from Network Rebusnet. (Rebusnet is itself a cross-platform, PAD-based download site, so it's not surprising that they have come up with such a custom tool. Gsoftpad seems to have started out as a Sourceforge project, and migrated to commercial—albeit shareware—status.) The figure below shows you the general interface (with the CryptaPix PAD file opened in the window).

Screen capture: Gsoftpad user interface

Essentially, the interface (apart from toolbar, menu and so on) is split into two panes: on the left is an expandable/collapsible "tree" corresponding to the document structure and, at the right, the actual data-entry fields where the various elements' contents can be edited. (Note that the fields are labeled not with raw element names, but with "user-friendly" words and phrases.)

One interesting Gsoftpad feature is field "validation." (This isn't validation in a true XML sense, against a DTD or XML Schema. The validation is performed against the PAD specification's criteria, such as those laid out in RegEx elements.) When validating, Gsoftpad goes through the PAD file, evaluating one field at a time and giving you the opportunity to correct its contents. In the partial screen capture below, you can see that the "Filename Long" field (corresponding to the Filename_Long element) has been left blank. (Also notice that the data-entry form behind the Validate Results window has automatically navigated to the element/field in error.)

Screen capture (partial): Gsoftpad validation message

Gsoftpad apparently doesn't "validate" PAD files in the sense of preventing you from creating invalid ones; it merely warns you about possible problems.

One other Gsoftpad feature worth mentioning: when you save a PAD file from within Gsoftpad, the program also saves an XSLT stylesheet for transforming the PAD document to HTML—and inserts into the document the corresponding xml-stylesheet declaration. You can, of course, tweak this stylesheet however you'd like. If you just want to use the default stylesheet, you can still select various colors (links, background, etc.) in the resulting HTML document, from within Gsoftpad's Settings dialog.

A lesson from researching and writing the "XML Tourist" column which I'm always pleased to relearn: despite early and continuing grumbling to the contrary, XML just makes a lot of applications simpler. Its intelligent, straightforward adoption by ASP and others involved in software downloading sites is enough to gladden any developer's aesthetic heart.