|
PyRXP do not conform to the XML 1.0 standard, although it claims to. This is inimical to the very idea of standards. Standards were not designed to satisfy your specific requirements on your specific machine. They are designed with tradeoffs for everyone in mind. In XML one of the most important tradeoffs is that Unicode is the fundamental basis for XML, even though clearly processing Unicode is more expensive than processing more limited character sets.
If you need performance greater than what XML can accommodate, you should not be using XML. Plain text parsing options are a couple of orders of magnitude more efficient than PyRXP, so why do you put up with even PyRXP's relative slowness and bloat?
Your sentence talking about "archaeologists" seem to indicate hat you didn't even read the article. I ran into PyRXP's non-conformance while parsing a file that used ellipsis, which is a character, you'd probably have to admit, used by far more people than archaeologists. The specific example in which I selected a character that hapens to be of intrest to characters was when I was actually proving a positive of PyRXPU, the true XML parser in the Python/RXP family, against a specific corner case.
I never indicated that expressing Linear B is likely to be a common need. But are you trying to minimize the need in the real world for expressing Arabic, Chinese, Japanese, Korean and even the many high unicode characters used in European language documents such as smart quotes, em and en dashes and ellipses? PyRXP cannot handle any of these.
If you don't like the specifications that others make, then go invent your own (I myself have done this before and expect to do so again), but don't then try to confuse people about what you've done. There is already standard called XML and PyRXP does not conform to it, so no one should cause confuson by calling PyRXP an XML parser. I've done my bit to reduce the confusion by explaining the facts in detail. What you do with that information is your choice.
--Uche
|