I see some problems with you proposal. The one which concerns me most is encoding related.
XML document can be represented in many possible encodings (UTF-8, UTF-16, ISO-8859-1, ...). If you map this document directly into memory without some sort of encoding normalization (which is done in the most today's parsers) you will be forced to manually encode/decode all strings which are read/write from/into document.
Do you have any idea how to deal with this problem.
We (Ximpleware) recently released our software (in Java) under GPL. I would like to personally invite you to visit the project web site (http://vtd-xml.sf.net). Also your suggestions and feedback are very welcome.
Cheers,
Jimmy
One way to deal with character encoding is to build "intelligence" into directly various non-extractive string comparison functions.
Most people are used to UCS-2 string representation in their code. So a "non-extractive" comparison function needs to compare UTF-8 tokens (or UTF-16) against UCS-2 strings.
In addition,it may also resolve entity references on the fly during the comparison.
Same thing applies to text to numerical data conversion as well. An non-extractive version of "parseInt" needs to convert a UTF-8 (or UTF-16) token into an integer without "extracting" it out of the source document.