Sign In/My Account | View Cart  
advertisement

Article:
 Non-Extractive Parsing for XML
Subject: BEWARE: DANGEROUS ERRORS IN ARTICLE!
Date: 2004-05-31 07:33:34
From: OskariOlematon

Hi,


I would like to point out that the coding examples in the article contain at least two very elemental but dangerous C library usage errors. Please do something about them so that such mistakes are not copied by other inexperienced C programmers. The errors are related to C library functions strncmp() and atoi().


strncmp() is a very dangerous function which is better avoided if possible. If not possible, then it should be wrapped to make it easier to use. The error this article propagates is the failure to understand that even a partial match is considered a match. As an example, the following comparison reports the compared strings as equal:


strncmp("tokenNOT!", "...token..." + 3, 5)


This is exactly how the article proposes to compare a known string to a token. In this case they are erroneously reported as equal. Please don't do this! One must consider the whole known string.


The other error is to advocate the use of atoi(). atoi() should not be used because it offers no error detection what so ever. Use strtol() or something else instead.


These kinds of errors a the worst because they are very difficult to find in testing. They can go hiding for years and then emerge to bring your software down once it has been installed at hundreds of client sites. I sincerely wish the article was edited to correct these errors.


Best Regards,
Semi


Previous Message Previous Message   Next Message Next Message


Titles Only Full Threads Newest First
  • BEWARE: DANGEROUS ERRORS IN ARTICLE!
    2004-06-01 14:07:00 Michael Maron [Reply]

    Normally, a token is defined something like a sequence of symbols from the certain symbol range, for example, a-zA-Z0-9. All other symbols are considered as token separators. This is exactly how regexps work. So, I really don't know what is the point of using partial rather than complete comparisons.


    What is also highly strange is that the article considers an HTTP header, not an XML document as an example:


    Consider the following snippet of an HTTP header as an example.
    Accept: */*
    Accept-Language: en-us
    Connection: Keep-Alive
    Host: localhost
    Referer: http://localhost/links.asp


    <question remark="shrugging shoulders">
    What all this has to do with XML?
    </question>

  • BEWARE: DANGEROUS ERRORS IN ARTICLE!
    2004-05-31 10:44:19 jimmy_z [Reply]

    Thanks for the posting.


    Yes, you have some very good points. The purpose of using those example is to demonstrate the concept, not necessarily to offer details of implementation.


    In fact there are lots of things uncovered by this article:


    1. encoding: how to compare a UCS-2 string to a UTF-8 or UTF-16 encoded token
    2. entity reference: to compare a UCS-2 string to a token containing entities such as (#&s;)
    3. Normalized compare: to compare a UCS-2 string against the normalized UC2 view of the token


    overall, I am only trying to offer a starting point, definitely not a whole picture.


    Thanks again,
    Jimmy



Sponsored By: