NetTech Education
Alliance STG

STG Technical Report 97.3
(http://www.stg.brown.edu/pub/NHData.tr97.3.html)

School Data Collection on the Web:
An example with discussion

Technical Documentation

Contents


Overview

The system documented here consists, internally, of several specialized CGI scripts and a simple, generic DBM database. Externally, however, it consists of HTML pages and forms - making it possible for users to interact with the system from any computer with a Web browser such as Netscape, Internet Explorer, or Lynx.


Java Experiments

In efforts to make the system totally platform-independent, we began implementing its internal CGI modules in Java. Java was to provide us with simple, consistent, platform-independent means of coding our CGI scripts. Java, however, turned out to be less than adequate to the task, mainly because of its isolation from the user's environment (which is fundamental to all CGI programming), and because of difficulties we encountered running interpreted Java CGI scripts via our Web server.

Although we found programs at San Diego State University that enabled us to do CGI in Java, we decided against them because they required a platform-specific work-around in C. Although Java is steadily becoming more useful for CGI scripts, it was, and still is not (Jan 1997), a complete and well-developed resource.


Porting from Unix to Macintosh

Initially, we got the data collection scripts up and running under Unix. In efforts to demonstrate the transportability of the system, we decided to port it the Macintosh. Though not difficult in theory, this port was not without a few practical glitches. These include:

File names:
Unix file names, with the components separated by slashes, (e.g., /usr/bin) had to be converted to filenames with the components separated by colons. Also, we found it necessary to turn relative paths (e.g., etc/) into absolute ones.
CGI file types:
CGI scripts had to be saved as MacHTTP CGI Scripts. An extension for MacPerl 4.1.4+ is currently available at ftp://err.ethz.ch/pub/neeri/MacPerl/.
Failure of MacPerl CGI Library
We had to use the Unix CGI library for PERL, since we could not get the then-current (July 1996) version of MacPerl CGI library to function properly. See also our comments below on server platform recommendations.
File extension:
We had to ensure that all scripts had the .cgi extension, to prevent people's browsers from downloading the scripts instead of executing them.

The scripts ran fine under two different Mac-based Web servers, MacHTTPd and WebStar.


Porting from Unix to Windows 95

Porting from Unix to Windows was a lot more straightforward than porting to the Mac. Windows 95's long-filename capability allowed us to use the name filenames we had used under Unix. And Windows had no trouble using relative paths. Essentially, all we had to do with the paths and filenames was change the Unix slashes into DOS/Windows backslashes.

The only real difficulty we encountered was in getting the data-filing subroutine to work properly under PERL 5. PERL version 5 no longer has built-in support for DBM database files. To make them work, we simply added require SDBM_File; require TieHash; directives to our code, which linked in the necessary PERL 5 DBM routines.

Important to note in this connection is that DBM files are platform-dependent. That is, one must generate them on the server where they will be used, or else on an architecturally similar server. To eliminate any possibility of problems here, we recommend that people never copy the database from one machine to another, but rather generate on the machine where it will be used.

To run our PERL scripts from a browser, using a PC-based Web server, one must:

  1. Download and install PC PERL on the server.
  2. Make sure all scripts have a "pl" extension, i.e., that all script names end in ".pl".
  3. Associate the "pl" extension with the PERL interpreter. In other words, one has to make sure Windows knows that files with "pl" endings are PERL scripts. This can be done in many ways; consult the Windows manual if in doubt.

Please note that without a pl->PERL association, the scripts will not run as CGI applications.

Another (minor) difficulty we encountered in porting the Unix scripts to Windows involved server-side includes. Server-side includes (SSIs) are commands in HTML files that tell the server to include other HTML files. It's kind of like mail merge with an office word processor. You can tell the server to merge in data from other files before "printing" out the text of the requested document for the user.

By default, all HTTP servers we are aware of (in particular, the PC-based ones we tried) don't enable SSIs by default. This is partly for security reasons, and partly because, in order for the inclusion mechanism to work, the server has to read through every file it vends looking for SSI directives. If it finds any, it must then find and merge in the specified files. This takes time. To eliminate hassles and overhead setting up SSIs, as well as the inherent security problems involved in their use, we removed some functionality from the PC version, supplanting the SSI's with straight HTML code. Continuing the metaphor used above: We shelved the merging, and just typed in the text verbatim.

One excellent piece of news is that the Windows 95 scripts that arose out of this conversion process were found to run, nearly unaltered, under Windows NT. We have done the port. The system, though, has not been sufficiently tested under NT for us to formally certify it.


Server Recommendations

The question of authentication and server setup raises another issue: What platforms and/or operating systems should be used to run the CGI scripts?

In general, we recommend that our CGI scripts be run on a multi-user Unix or NT machine, rather than on a Windows 95 machine or a Macintosh. There are certain disadvantages to doing things this way. Unix and NT are more complex than MacOS or Windows, and therefore potentially less secure and more difficult to administer. The advantage, though, is that Unix and NT were coded, from the ground up, for efficient, reliable, networked use.

Currently, the strongest reasons for using NT are the relative ease of use and compatibility with other versions of Windows. The main reason for using Unix, on the other hand, is that it is an open-specification system, not owned by a particular individual or corporation. PERL is also best supported under UNIX.

A final recommendation we would make is that districts give serious consideration to buying space on an existing server rather than setting up one themselves. Most Internet service providers offer such space as part of a package deal, and will, at the customers request, install CGI scripts that they can comb and verify as secure. In many cases, they will even customize CGI scripts for a reasonable fee. All that would be required of the district, in this case, would be to provide what little training was required for data entry, and to appoint a technical liaison with the Internet service provider.


The Scripts

Source code for the scripts discussed here is available for all three platforms discussed here, Windows, Macintosh, and Unix, subject to the terms outlined in the Technical Report.


Please send questions and comments to nhdata@brown.edu