Copyright 1998 Richard Goerwitz and Brown University Introduction ============ Xmlparse is a utility for validating XML files, i.e., for checking whether their basic internal structure conforms to the XML spec. Xmlparse is free software. You may use it. But don't pretend you wrote it. And don't blame us if it doesn't do what you want or expect it to do. Also, be sure to take a look at the COPYRIGHT file, which you should have received, along with this INSTALL file, as part of the Xmlparse distribution. Note that, in order to compile and install Xmlparse, you'll need (1) GNU make, (2) GNU Bison, (3) an ANSI C compiler, preferably GCC, and (4) some version of GNU flex that has a -U switch (i.e., that accepts Unicode or UTF-16 characters). As of version 2.5.4, Unicode support has not been integrated into the base flex distribution. To use flex 2.5.4, you must obtain Unicode patches from James Lauth (last known e-mail: jlauth@lauth.com) or from the author, Vern Paxson (vern@ee.lbl.gov). You must, additionally, up the MAXLINE define in flexdef.h to 16384 (flex defaults to 2048, which is just too small for the huge Unicode charsets I define). Note1: James Lauth has recently made his Unicode patches to Flex available at ftp://ftp.lauton.com/pub/flex-2.5.4-unicode-patch.tar.gz (28/April/99). Note2: Earlier versions of Lauth's patches used wchar_t as the basic Unicode character type. More recent patches use unsigned short int in its place. If you compiled Flex from an older set of Lauth's patches, you may need to alter the line in Xmlparse's 'general.h' file that says "typedef u_int16_t my_wchar_t;" so that it says "typedef wchar_t my_wchar_t;". Doing so will result in a substantially less efficient parser. But it will allow Xmlparse to work with the old versions of Lauth's Flex patches. Common Problems =============== The most common problem people encounter when building Xmlparse is that they first try to use an unpatched version of Flex. What this does is drop a zero-length lexutil.c file into the build directory. To clean up from this, install a patched version of Flex, execute 'make distclean' in the Xmlparse build directory, and then run the './configure' script again as per the directions below. The second most common problem people encounter when building Xmlparse is that they don't install GNU Bison correctly. If you don't have Bison installed in a directory where 'make' can find it, the 'configure' script will try to use YACC. While it's possible to get Xmlparse to compile with YACC, you have to hand-edit the parsutil.c file. And I frankly haven't tested it much with YACC. For most situations, therefore, it's better just to use Bison. Installation ============ Before typing 'make' or doing anything else, run the 'configure' script. The 'configure' shell script attempts to guess correct values for various system-dependent variables used during compilation. It uses those values to create a 'Makefile'. Full installation directions are as follows: 1. 'cd' to the directory containing the package's source code and type './configure' to configure the package for your system. If that doesn't work, your system probably doesn't meet the requirements for compiling and installing 'xmlparse'. Running 'configure' takes a few minutes. 2. Edit the newly generated Makefile to reflect the peculiarities of your site. Make sure you get the installation directory names and compiler options right ('bindir', 'mandir'). Uncomment the line that says '-DXML_NODEBUG' for faster code (the cost here is that you lose debugging facilities). 3. Type 'make' to compile the package. 4. Read the manual page (xmlparse.man) and edit the sample config file to taste (xmlparse.cfg). Be sure to point the line labeled url_resolution_cmd_string at a utility you can use for fetching external URIs. By default, it points at 'get_uri' (which is just a shell script included in this distribution that calls Lynx). If you use 'get_uri', edit it until it works on your system. 5. Su root and type 'make install' to install the programs and any data files and documentation. By default, 'make install' will install the various files in '/usr/local/bin', '/usr/local/man', etc. You may specify an installation directory prefix other than '/usr/local' by giving 'configure' the option '--prefix=PATH'. See "Operation Controls" below for more on 'configure' startup options. You can also hand-edit these paths in the Makefile after running ./configure, as noted in step (2). 6. You may remove the program binaries and object files from the source directory by typing 'make clean'. To also remove the files that 'configure' created (so you can compile the package for a different kind of computer), type 'make distclean'. 7. If you have a webserver on the machine where 'xmlparse' has just been installed, consider installing xmlvalid.pl as well; see the comments at the top of the xmlvalid.pl file for more information in how to do this. If you install xmlvalid.pl, you may also want to use our sample HTML front-end, xmlvalid.shtml, as a template. Please, though, don't use xmlvalid.shtml as-is. And please, comb the COPYRIGHT file included with this distribution for any re- strictions that might apply. Operation Controls ================== 'configure' recognizes the following options to control how it operates. '--cache-file=FILE' Save the results of the tests in FILE instead of 'config.cache'. Set FILE to '/dev/null' to disable caching, for debugging 'configure'. '--help' Print a summary of the options to 'configure', and exit. '--quiet' '--silent' '-q' Do not print messages saying which checks are being made. '--srcdir=DIR' Look for the package's source code in directory DIR. Usually 'configure' can determine that directory automatically. '--version' Print the version of Autoconf used to generate the 'configure' script, and exit. 'configure' also accepts some other, not widely useful, options. Restrictions ================== Xmlparse was written by Richard Goerwitz of the Brown University Scholarly Technology Group. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY, of any kind, including any implied warranty of merchantability or fitness for a particular purpose. Use this program at your own risk. See the COPYRIGHT file for full terms and usage restrictions. Richard Goerwitz email: Richard_Goerwitz@Brown.EDU