Analyzing the Web
by John E. Simpson
|
Pages: 1, 2, 3
Reporting Web Usage with XML
If you think about it, XML offers few advantages over plain text as a format in which to keep usage logs. A Web developer or her customers simply don't care, for the most part, about the details of individual sessions. What they care about are aggregate statistics. And for such an application, data represented as XML can be very useful. It can be transformed to (X)HTML, XSL-FO (and thence to PDF), or any of various other presentation languages. It can be easily loaded into databases for further massaging. Its representation can be customized endlessly, repackaged and repurposed however needed. And that's what website logging is doing in this column.
Just as with raw XML files, CLF files are human readable, as long as you know what's supposed to be in each field. And as with many XML applications, log files aren't really "read" very often, legible or not. (For one thing, they can be huge, running into hundreds of thousands of records, depending on the number of sessions recorded.) Rather, they're fed into any of a host of software packages, which then convert the raw data into a form which is not just human readable, but also human meaningful.
While some of these reporting packages simply display the aggregate results of a given log file in some proprietary manner, others can save or export the data for later use, perhaps by an entirely different application. I'll take a brief look at two of these log analyzers.
eWebLog Analyzer (eWLA)
esoftsys's eWebLog Analyzer runs on the full range of Windows platforms; it comes in a 30-day free trial version, after which a single-user license is $79.
When you've loaded a log file into eWLA, the system aggregates the data on various dimensions and then displays the results as a straight text list, or in a combination of graphical and tabular formats. For instance, here's an eWLA graph showing how page hits and visits were distributed by day of the week, over a month's period, for a Web site I work on:

Figure 1.
When you save eWLA reports as XML, the format is simple. The
root element, DATA, comes with a handful of general
attributes such as the date/time of the export. Within the
DATA element is a General element
(which totals overall site statistics, such as total hits and
average access duration), followed by one element for each type
of report. For instance, there's a ByDay element
which records the number of hits, visits, bandwidth, and so on
for each day in the reporting period. Each of these report-type
elements has various occurrences of a single, empty child
element, ROW. You can think of the ROW
element like a row in a table; its attributes are attributes to
ROW.
For instance, an eWLA export, which includes the days-of-the-week data shown in the above graph, looks like this:
<DATA Description="eWebLogAnylyzer Export"
Title="xml_com_demo" DateExport="7/21/2005 11:01:10 PM"
Ver="1.10">
<General>
...
</General>
...
<ByDow>
<ROW Day="Monday" Hits="4416" Visits="984"
Bandwidth="20.56 MB" Pages="2175" Errors="2"
AvgVisitLen="3:59"/>
<ROW Day="Tuesday" Hits="4036" Visits="1096"
Bandwidth="19.40 MB" Pages="2260" Errors="0"
AvgVisitLen="2:59"/>
<ROW Day="Wednesday" Hits="5045" Visits="1234"
Bandwidth="24.27 MB" Pages="2808" Errors="18"
AvgVisitLen="3:36"/>
<ROW Day="Thursday" Hits="4813" Visits="1204"
Bandwidth="21.48 MB" Pages="2445" Errors="0"
AvgVisitLen="3:25"/>
<ROW Day="Friday" Hits="3411" Visits="921"
Bandwidth="17.50 MB" Pages="1827" Errors="0"
AvgVisitLen="3:13"/>
<ROW Day="Saturday" Hits="3209" Visits="879"
Bandwidth="16.14 MB" Pages="1888" Errors="0"
AvgVisitLen="4:30"/>
<ROW Day="Sunday" Hits="3654" Visits="799"
Bandwidth="15.13 MB" Pages="2020" Errors="12"
AvgVisitLen="4:16"/>
</ByDow>
...
</DATA>
While this report type/ROW format is almost
mindlessly simple, it's also effective as a potential platform
for further manipulation.