STG Technical Report 97.3
(http://www.stg.brown.edu/pub/NHData.tr97.3.html)
As the number of schools connected by the Internet continues to grow, there is a growing interest, on the part of State Education Departments in the NetTech region, in using the World Wide Web (WWW) as a data collection tool. This report was inspired by, and was made possible with the kind cooperation of, Dr. Judith Fillion and Sallie Fellows of the New Hampshire Department of Education, who asked for a demonstration of how HTML forms and Common Gateway Interface (CGI) scripts could offer state education departments an opportunity to collect school data over the Web, without relying on traditional paper or floppy disk exchange methods.
Using a paper copy of a personnel form that New Hampshire routinely requires from each of its districts twice each year, we present here an equivalent electronic HTML form through which each school or school district could submit its data over the Web and, without human intervention:
We begin with a discussion of networked models for data collection that do not use the Web (i.e., that do not make use of the Internet Hypertext Transfer Protocol, HTTP) in order to motivate the use of the Web in our example. We then explain how the New Hampshire example was implemented, and discuss some of the critical issues involved in planning a Web-based data collection system at the State level. Although our choice of personnel data is admittedly banal, our discussion is meant to stimulate ideas about collecting and sharing a variety of school data (e.g., statistics about student work, school health statistics, lesson plans and syllabi, or budget information) using the Web.
Perhaps the most elementary use of a state-wide computer network for the collection of school data, would involve the use of a central file transfer protocol (FTP) server, located in the state's department of education, to receive uploaded files from individual school districts, and provide files for those districts to download. Such a system would provide a low-bandwidth, low-maintainence solution and is especially useful in cases where: schools and school districts use different computing platforms; connections to the Internet are slow; Internet services are restricted and variable from district to district; and the cost of Internet service or software is an issue [1].
An FTP server used for data collection or retrieval allows files in various formats to be uploaded and downloaded quickly and easily; but, this may be a regarded as a disadvantage when states cannot control the brands and versions of software used by schools and school districts [2]. Further disadvantages of the FTP server solution for school data collection are:
At the other extreme is the construction of a state-wide computer network for school data collection that could be run as an Intranet, using sophisticated software that allows for controlled data transfer, collaborative work, and group communication [4]. Although such a system would provide great functionality and security, as well as creative possibilities for data collection, dissemination and analysis on the network, the bandwidth requirements, as well as the hardware, software and staff training costs, are likely to remain beyond the means of most schools and school districts in the near future.
A Web-based approach, then, may provide states and schools with a practical, and desirable middle ground. One of the most promising features of Web-based data collection systems is their ability to address the shortcomings of the FTP server system, using resources far less sophisticated and expensive than those required by groupware applications built upon proprietary protocols. Users of the Web-based system described and demonstrated in this report can submit data from virtually any machine connected to the Internet, and that data can be stored that on any machine capable of housing a Web server and a PERL interpreter -- computers running Windows, MacOS, Unix, and Linux, for example, can all currently do this.
Another promising feature of a Web-based system is that, because HTML forms can be made visually equivalent to the original paper forms, the only training required for people who would like to enter data would be their learning how to use a Web browser. We believe most school employees, in a few years at most, will already be familiar with such technology [5].
Twice each year, the New Hampshire Department of Education requires school districts to complete the personnel form featured in this report. Thus the task of our Web-based example is to illustrate how the data could be gathered from different locations, collected and stored centrally, made available to users with appropriate permissions at different locations, and made unavailable to users not possessing these permissions. In presenting the example in the form of an electronic report, we include not only an explanation of the data collection system, but an opportunity for readers to actually use the system: by clicking on appropriate links and buttons the reader will be able to interact with the software and the sample database located on the STG Web server.
The system explained and demonstrated here consists, on the server side, of several specialized CGI scripts and a simple, generic DBM database. The system encountered by the user, however, consists only of HTML pages and forms. This makes it possible for users to intract with the system from any computer capable of running a Web browser (e.g., Netscape, Internet Explorer or Lynx).
The programs facilitating the interactive forms are all implemented as CGI scripts, written in the popular programming language called PERL. [6] These scripts are small and easily produced, and the system is generic in the sense that it can be easily customized to suit the individual data collection, authentication, and reporting needs of any school department in the NetTech region.
The data collection system based on the New Hampshire personnel form can be demonstrated in two parts:
We will say more about authentication and security below, but here we note that the electronic form has Username and Password fields at the bottom of the page that could be used for authentication in addition to the District/Superintendent identification method. (For the purposes of this example, we do not require either a username or password, and we allow users to create new associations between districts and superintendents.)
Finally, we have configured the system to confirm and display the data you enter, and to return 1996 data from the district you claim to represent, coloring red those fields that show increases or decreases of more than 25% in the two-year period. This is meant to demonstrate the automatic confirmation system, and to suggest ways that a state department could use the system to call attention to statistics it thinks particularly important.
You may wish to enter or update data in the Personnel Form, to see how the interactive system works (If you have questions about, or find errors in, this or any part of this report, please let us know by sending e-mail to the authors or simply to nhdata@brown.edu).
The way the CGI scripts work can be illustrated as follows:

The top portion of the diagram shows what happens when a user requests the data collection form from the Web server (e.g., when the user clicks on enter or update data in the Personnel Form [http://www.stg.brown.edu/cgi-bin/edu/input_data-demo], here or above). The Web server calls a CGI script (input_data-demo) that, after identifying the browser, sends out an appropriately formatted data entry form.
The lower portion of the diagram shows what happens when the user fills out the form, and submits the data to the Web server. The Web server calls CGI scripts that validate the data, convert it, then enter it into the database. (Each entry in this particular database associates a key consisting of 1) the year, 2) the name of the district submitting the form, and 3) a label indicating the source field in the input form, with the data the user entered into that field.) The CGI scripts then re-fetch the data that was entered, along with the data from the preceding year, highlighting any significant changes. Finally, the Web server sends this data back to the user, and requests verification.
A Web server holding files containing fundamentally private information should raise serious questions about security. In essence, the operation of a Web server opens the computer on which it is running to millions of potential users. Although Web servers allow administrators to limit outside access in various ways, combinations of oversights, on the part of system administrators and software designers, continually expose security "holes" that can be exploited by expert users.
This said, it might easily be argued that the level of security offered by a competently administered Web server is considerably greater than in an average office, where forms are collected, copied, collated, and sent out to various people who may or may not have any concern for the security of the data represented. Unless the data requires considerably greater protection than one would normally find in such an office, we believe a well-administered Web server will provide more than adequate security.
Perhaps the most critical security measures for school districts using the computer for data collection and storage are the performance of regular backups for computer data, and the enforcement of some sort of user authentication. In the system demonstrated in this report, the only authentication used involved the identification of superintendents. A password facility is built into the scripts, but has not been used in this demonstration. States and districts that consider making use of some variant of this system should feel free to add server-enforced password protection, (e.g., htaccess files) as well, or, if the security of the data is an overwhelming concern, end-to-end encryption. These protections can be combined with restrictions based on IP address and/or machine name (allowing districts to require that data be input from specific machines or domains).
Obviously, state departments should make sure to instruct the person ultimately responsible for the configuration of the Web server to provide whatever forms of authentication are necessary. As not all Web servers can perform all of the forms of authentication mentioned here, before purchasing a piece of hardware or software we advise that you look into whether it supports the desired features [7].
Although it would have been possible to use a large, commercial database package to house data from our forms, we felt it was important to show this was unnecessary. All that is required here is the ability to associate keys with values; in our example the keys were a combination of the year, the district, and a location in the input form. The values were just that: values typed into the input form at the given locations by a user. For example, key = "1997:District-10:f1a", value = "145", means "for the year 1997 in district 10, the value given for field 1a in the input form was 145".
Normally, a set of key-value associations are implemented by programmers using objects called hash tables. Hash tables, however, only stay in memory as long at a program is running. Our system clearly required something that would hold our data on disk for subsequent program runs. What we needed, in other words, was a persistent hash table.
Fortunately, most PERL installations have a set of simple, fast, hash-like routines known as DBM routines. These are elementary, generic database routines capable of storing key/value combinations in the manner of a hash table, and of doing it on-disk, rather than in volatile memory, so that the data could be accessed and modified on subsequent runs.
Similarly, the functions of DBM routines can be duplicated by any commercial database. As a result, if a district wants to import its data at the end of a given recording period into a larger commercial database, or into desktop spreadsheet programs, all it requires is a few lines of additional PERL code. PERL, fortunately, has modules for outputting files in most major database formats. It can also easily produce tab-delineated files suitable for import into, say, Excel [8].
Further technical documentation about our choice of scripting language, server recommendations, and porting scripts between operating platforms is available on a separate page.
The example of a Web-based school data collection system demonstrated here is fast, simple, flexible, and inexpensive. It shows how basic Web technology can be used now to improve communication and information sharing between schools, school districts and the state administration. We hope state departments of education in the NetTech region, and elsewhere, will benefit from this example and discussion, and will make use of our findings in their planning processes. Finally, anyone may use and modify any of the forms, scripts or documents available here, provided that all copyright information is left intact.
[2] Although interoperability and platform independence have long been mentioned as goals in software design, an inability to share documents created with different versions of a particular software package (e.g., Microsoft Word), much less those created by different software packages, remains a common and thoroughly frustrating experience.
[3] By "synchronous" here we do not mean instantaneous, but basically within the same temporal frame. This is opposed to asynchronous methods, in which the sender and receiver are not necessarily in the same temporal frame (e.g., communication by surface or electronic mail).
[4] Probably the most well-known of the these systems, or environments, is Lotus Notes; but it is significant that most "groupware" producers (including Lotus) are turning to the Web as a primary medium for development.
[5] Not everyone would agree with this assessment, of course, and it is an open question whether the Web browser will be a long-lived interface. Indeed, in his recent novel Infinite Jest (Little, Brown and Company, 1996), a story set in the first decade of the 21st century, David Foster Wallace alludes to the Web itself as having been a short-lived phenomenon (see especially p. 620).
[6] For more on the Practical Extraction and Report Language (PERL), see the searchable manual page at Carnegie Mellon University (http://www-cgi.cs.cmu.edu/cgi-bin/perl-man).
[7] For an example of this kind of information, see the WebCompare site at http://webcompare.internet.com/.
[8] For a list of publically available PERL modules, see http://www.metronet.com/perlinfo/modules/ .