Text Encoding Initiative
David R. Chesnutt
University of South Carolina
[This paper will discuss the relationship of the Text Encoding Initiative to the Model Editions Partnership in general, the relationship of the TEI markup system to each of the prototypes of historical editions in particular. Concluding remarks will focus on the Partnership's experience in working with the TEI Guidelines.]
The Text Encoding Initiative and the SGML markup system developed under its aegis has had a profound effect in the development of digital resources in the humanities. Nowhere is this more evident than in the Model Editions Partnership--a project which is developing prototypes of electronic historical editions on the World Wide Web. Although the Poughkeepsie protocols modestly spoke of developing a simple metadata system for data interchange, the TEI quickly expanded its focus to include the development of generalized markup which could serve a number of disciplines--linguistics, literature, scholarly editing, theater, etc., etc. At the same time, the TEI made no claim to having created markup which could meet every need of every scholar. Indeed, the TEI Guidelines made explicit the expectation that scholars would want to modify or extend the base tag sets which had been developed by the committees and work groups of the TEI. At the same time, the Guidelines provided us with a carefully crafted markup system which met most of our needs. If we had had to build a markup system for the Model Editions Partnership from scratch, the Partnership would probably never have come into existence. I think is fair to say that the Partnership exists because the TEI exists.
The editors who head the seven editions in the Partnership's consortium are seasoned scholars with broad experience in publishing scholarly texts. From their point of view, the most important aspect of publishing on the Web is to make their scholarly texts accessible to a larger audience. Ann Gordon's 1992 study for the American Council of Learned Societies demonstrated that historical editions are used by a many groups--ranging from scholars, to genealogists, to lawyers and even high school students. Yet today, there are probably less than 400 libraries in the world which have full holdings of these editions. (The average sales for most editions are down to little more than 300 copies and many are less than that.) In view of this, the editors' interest in expanding their audiences is easily understood.
But these are seasoned editors who are dedicated to their texts and to providing supplementary information to make those texts understandable. Many of them are accustomed to presenting very conservative texts which record every cancellation and emendation; preserve original spelling and punctuation; report variants when multiple copies exist; and the list of editorial principles usually goes on for a number of pages. Needless to say, a simple markup system like HTML is not designed to meet this kind of editorial rigor. Bear in mind also that these same scholars are accustomed to dealing with typesetting markup systems which can define such esoteric characteristics as the thickness of a line used to present canceled type.
Although we may not be able to make displays on computer screens today with such precision, we can record that level of detail in the TEI markup. And once we record it, we can devise visible editorial conventions which enable us to convey the information to our readers. For example, we can use the tag for regularization <reg> with an attribute which records that the scholarly editor has determined that a full stop should be inserted at point X. Then we can use a stylesheet to display that information as a period within square brackets. Or we can display the text without the regularlized punctuation. But the important point is that the TEI markup give us a way of recording that editorial decision. Perhaps even more important is the intellectual frameworks that the TEI markup and its extensions are allowing us to construct for the prototypes.
Our approach in adapting the TEI Guidelines to our work is probably not uncommon. We have a data capture DTD we use in marking up the texts; and we have an archival form of the DTD for long-term migration. In the data capture DTD, we redefine the element <docGroup> and we have four main elements within it. The document <doc> element is used for fulltext transcriptions; the surrogate <surrogate> element is used for abstracts of documents; the target <target> element is used to provide information about images of original documents; and finally, the docgroup <docgroup> element itself can be used to create sub-groups within a docgroup.
<docGroup> <doc> <surrogate> <target> <docGroup> </docGroup>Other extensions in the data capture DTD were added to make it easier for the student encoders to mark up the text. For example, we created specific elements for font shifts like italics <ital> and we created specific elements for the most common kinds of names <person>, <place>, <org>, etc. Later we will develop a mechanical process to restore the more common TEI form which is used in the archival DTD. I will discuss this further when I explain the markup decisions we made for each of the prototypes.
The Partnership editors' concern with textual features is a fundamental aspect of creating reliable and useful texts. Mary-Jo Kline is fond of saying "every markup decision is an editorial decision." Her comment is based on her extensive editorial background and on her experience as an SGML consultant on the American Memory project at the Library of Congress. The Partner editors would certainly agree with Kline's assessment. Every decision we make about recording this or that aspect of a text is an editorial decision. And each of those decisions affects the nature of the text itself and the ways in which we can present it to readers. For that reason, the first six months of the Partnership was devoted to a discussion of the kinds of texts we work with and the kinds of markup that would be required for those texts. In other words, we spent a lot of time on document analysis and thinking about what kind of markup was needed. We subsequently published the results in a document we call "A Prospectus for Electronic Historical Editions." Our concern at the outset was to make sure that technology would not restrict our intellectual freedom. By then, however, we were already convinced that we could go far beyond the restrictions imposed by the print media (or by our cost-conscious publishers). By that time, the editorial design for each of the prototypes had begun to take shape.
Two of the projects--the Lincoln Legal Papers and the Margaret Sanger Papers--are what we refer to as "silicon microfilms." Like microfilm editions, these projects present images of original documents instead of transcriptions. And like microfilm editions of the past, both projects see these as forerunners of later publications which will present full transcriptions along with the usual editorial apparatus to place the documents in historical contexts. Because the Lincoln Legal Papers is a large database project, and because our mission is explore a variety of approaches to electronic publication, we decided to deliver the Lincoln prototype as a Web database with CGI user interfaces.
The Sanger prototype--like all of the others in the Partnership--is part of an SGML database which is delivered on the Web using DynaText. The structure of the database is also designed so that it can be delivered for Panorama, but that has to wait until we finish the DynaText models. Of the six mini-editions, the Sanger project has been the most challenging in some ways. Early on, we had decided that images should be wrapped in an "SGML envelope." We came up with the envelope concept because we needed to be able to record information about each image--source of the original, copyright, permissions required, etc., etc. In working with the Sanger images, however, we found that we needed a much more extensive and formalized structure. We needed to be able to replicate all of information typically found on a "target" in the Sanger microfilm edition. A target is the introductory frame in a microfilm which provides information about the document it is describing. But the kind of "targets" the Sanger editors had in mind turned out to be quite different from their microfilm counterparts.
As I mentioned earlier, <target> was one of the new elements we created as part of the <docGroup> structure. I will come back to this later, but for now, I want to comment on the functionality which is built into these targets. As part of the mini-edition, the Sanger editors have provided a number of useful items: biographical sketches of the principal characters in the documents; a chronology of Sanger's activities during the period under study; and addresses of the copyright owners and the repositories where the originals reside. Access to this information is provided by hypertext links from the targets using the TEI x-pointer linking mechanism. Thus, the thrust of the Sanger prototype is to demonstrate how "silicon microfilms" of the future can be enhanced with editorial material in ways simply not practical in traditional publishing. The other five models are all fulltext editions, but each is uniquely designed to meet editorial concerns which are difficult to address in print publications.
The Stanton and Anthony editors prepared more than 30 maps to illustrate the travels of Susan B. Anthony in western New York state. The maps provide a graphic illustration of the lengths to which she went to organize the women's rights movements in the 1860s. The edition also uses markup to flag the editors' additions of punctuation and portions of the text supplied from other sources. This enables us to use stylesheets to deliver an "emended" text which shows the editorial changes and other textual features like the writer's own cancellations and emendations vis a vis "clear" text which does not.
The Greene editors prepared fulltext transcriptions for all of the previously published abstracts which were included in Volume 7 of the edition. Double-pointed hypertext links allow the reader to move back and forth easily between an abstracted version with its notes and the fulltext version. The same hypertext features also apply to the editorial notes which are supplied with each document.
The editors of the Documentary History of the First Federal Congress wanted to show how an electronic edition could be used to link their various series of publications. (They organize the edition into segments: official journals, legislative histories of laws and bills, biographies of the members, diaries of the members, and letters of the members.) For the mini-edition, they use the legislative history of the laws establishing the executive branch of government to link to related materials drawn from the other series. Clicking a mouse is obviously easier that opening another volume.
The editors of the Documentary History of the Ratification of the Constitution and the Bill of Rights used a portion of a spin-off volume--Slavery: A Necessary Evil?--as the focus of their mini-edition. One of the important concepts demonstrated in the edition is the use of markup to enable users to locate information. What the editors wanted was to be able do a proximity search linking the name of a person with a particular subject. And they wanted to restrict that search to reports of what a person said about that subject. In other words, they wanted to be able to find every instance when Rawlins Lowndes spoke in defense of slavery. To accomplish this, we added a speaker attribute [spkr = "Lowndes"] to both the division, paragraph and segment elements. This also proved to be a useful device in marking up the letters and diaries in the First Congress project.
The editors of the Papers of Henry Laurens chose a segment from an earlier volume documenting Laurens role in seizing power from the British during the early phases of the American Revolution in South Carolina. Like the Stanton and Anthony project, the Laurens mini-edition reflects the use of markup to provide multiple views of texts. In one view, all of the emendations, cancellations and other textual features are shown; in another, a clear text version is presented. The first view is primarily for scholars who study documents closely; the second is primarily for more general audiences who are not interested in textual features.
Let me return now to the target <target> element we created for the Sanger mini-edition. The Sanger project recently published a very sophisticated microfilm guide based on a relational database which they had created. The editors used the database to generate "targets" describing each of the documents in a microfilm edition of the Sanger papers at Smith College and another microfilm of documents located during an international search by the editors. And that same database was used to generate the targets for the documents in their mini-edition. The editors also created two other databases to help them in their research. One database contains biographical information about Sanger's correspondents and other people who were prominent in her life; the other contains a chronology of Sanger's activities. In designing their mini-edition, the editors' goal was to integrate the biographical and chronological information with images of selected documents. The targets describing each document thus became the springboard for linking the documents to the biographies and the chronology. To accommodate the Sanger design, we extended the TEI Guidelines and created a new element <target> patterned on the actual microfilm targets. The target element is basically a reflection of the existing fields in the Sanger document database and includes sub-elements which describe:
Some of the sub-elements already existed in the Guidelines; others were added as necessary. The date is linked to the chronology with a TEI P3 cross-reference; the names in the reference section are linked to the biographical index using the same construct. Our experience with Sanger demonstrated the flexibility of the TEI Guidelines. Extending them to meet the needs of the Sanger mini-edition was not difficult. But then, not every project has a Michael Sperberg-McQueen as a co-coordinator. Ultimately, we will publish the MEP DTD's as well as the documentation we are developing. The idiosyncratic nature of the target element means that other projects will probably not use it as it stands. Even if there's a sudden rush to create new silicon microfilm editions, some of the sub-elements are just too project specific. On the other hand, perhaps it will be useful as way of showing others how the Guidelines can be extended to meet their particular needs. As for the other extensions we made to handle the organization of historical documents in the MEP fulltext mini-editions, those extensions may in fact become commonplace because they reflect current practice in the historical editing community. Of course, for that to happen, we need a TEI Work Group on Historical Documents. The Text Encoding Initiative is one of the great success stories for scholars in the humanities. The TEI Guidelines have given us a base on which to build the frameworks of modern scholarship. Equally important is their role in building resources which can be sustained over time--regardless of how technology changes. As we look toward the future, we need to find ways to perpetuate the work of the Text Encoding Initiative and to support the scholars who rely upon the Guidelines to deliver their resources. This may be our most important challenge.
Back to Technical Program