XML Validator Frequently Asked Questions List
This page lists frequently asked questions regarding STG's XML Validator. If you have a question, check to see if it is answered here before firing off e-mail to STG.
Because it's a beta test version. If you run into a problem, we'd actually be very grateful if you'd send us a bug report.
First, check to be sure the "bug" isn't discussed below. If it isn't, create a short XML file (preferably standalone) that illustrates the problem; then mail it to us at STG. We'll get back to you. If you can't illustrate the problem with a single XML file, feel free to send him a .zip or .tar archive.
Although it's difficult to answer this question without seeing the actual XML document that is being validated, experience has shown that this question most often arises when someone attempts to validate a document that lacks a document type definition (DTD). Any XML document that lacks a DTD is, by definition, invalid, and may trigger a cascade of error messages.
(Of course, the other typical reason that people get a lot of error messages is that the document being validated has a lot of errors.)
See also the next two FAQs.
STG's validator follows the XML 1.0 specification pretty closely, providing a wide assortment of warnings about problems, both potential and actual, that most other validators ignore. Most of these messages have to do with XML - SGML compatibility and interoperability issues.
Here are some sample warning messages, with explanations of what they mean, and why you may (or may not) want to pay attention to them:
The short answer here is that STG's validator goes a bit above and beyond what the specification actually calls for in the way of validation.
The longer answer follows.
Most validating XML parsers validate XML documents as part of a more general process (e.g., readying them for manipulation and/or display). That is, they aren't there simply to flag errors. STG's parser/validator on the other hand, does little else. Our validator, in other words, has as its primary purpose to flag errors, and to help you locate potential problems in your XML.
As a result, our validator can be far more aggressive than it strictly needs to be. In particular, it can resolve and/or process all entities declared in your DTD. If it finds errors that may pose problems down the road, it will flag them - even if you don't happen to use the entity in question in the document you are validating.
The idea here is to help designers avoid half-baked DTDs that seem to work fine with some documents, but suddenly start producing unexpected errors when used on documents that happen to make use of invalid entities that were lurking unused in the DTD.
Yes. But to do so you may need to compile the back-end parser from source yourself and install it. The source code for the parser is available at STG's website.
Note that this software is still in beta testing, and will doubtless contain many bugs. Please let us know if you find one, preferably giving us enough information to reproduce it (e.g., your OS version, parser version, and sample XML input).
You are getting ambiguous content model errors because at least one of your content models is nondeterministic (in SGML terms, "ambiguous"). In essence what this means is that the content model(s) in question can match identical XML element sequences in more than one way.
STG's XML validation system aggressively reports such ambiguities not only because the specification says it should (appendix D)., but also because XML software strives for simplicity and consistency. If you give XML software an element stream that can be processed in several different ways, it will normally select just one of those ways (probably not even telling you what it's done), and then continue processing. This situation can lead to confusion, especially when you aren't aware that there were any ambiguities in the first place.
The most frequent cause of ambiguous content models is the use of patterns like ((a, b?) | a). Take, for example, the following DTD fragment:
<!ELEMENT postalcode (#PCDATA)>
(In the United States, a postal code consists of five digits plus an optional extension.) Although old SGML hands rarely make such mistakes, one often sees XML DTDs containing expressions that, in this instance, would reduce to:
((postalcode, postalcode_extension?) | postalcode)
When an XML processor, having internalized the above content-model fragment, sees an actual document instance containing a basic five-digit postal code
it has no idea whether to process this postal code as an instance of a postal code plus a null extension, or as a complete postal code in and of itself. That is, it has no idea whether to treat it as an instance of (postalcode, postalcode_extension?) or of (postalcode).
If you find you are getting ambiguous content model errors, check for situations like the above, where the same XML text could match your content model in multiple ways.
If you aren't concerned about such problems, feel free to turn off warning messages altogether using the checkbox provided on the main validation form.
The reason why STG's validator cannot validate XML document instances against local DTDs (e.g., DTDs on your local hard drive) is that it must be able resolve and fetch over the network any external entities it needs to in order to process your document. For it to resolve and fetch arbitrary local files on people's hard drives, everyone would need to offer our validation system access to their local filesystems.
Needless to say, this sort of access (if a reasonable way could be found to offer it) would present an unacceptable security risk.
If you want our validator to be able to find your DTDs, therefore, you must place them in a public directory on a webserver you have access to, and change your system identifiers to point to the relevant URIs.
If you have no access to a webserver, or if you are working on private DTDs and files, see above on compiling the parser locally.