Text Encoding Initiative
An increasing number of electronic text centres, libraries, and archives from around the world are deciding to follow the principles and practices outlined in TEI P3, and hence seeking to adopt the effective use of the TEI Header as a means of describing and documenting electronic textual resources. Metadata of the kind described by the header has a vital part to play in information management and retrieval yet variant practices with respect to the format and use of the header abound.
For existing and potential users, the flexibility offered by TEI P3 is one of its most attractive features. The Guidelines allow for widely divergent approaches to the basic issues of encoding electronic texts, and providing metadata in the form of TEI Headers. This is entirely appropriate for a general purpose scheme, and for individual scholars seeking only a scheme capable of expressing their (often complex) analytic needs. For implementers working within a common framework, and with similar objectives however, this generality and expressivity imposes an additional burden. Such implementors must identify a mutually acceptable code of practice in the application of the scheme to their needs, or compromise one of the very purposes for which the TEI scheme was designed the mutual interchangeability of texts and their associated headers.
This paper will present the results of an attempt to address this problem at source, by bringing together an initial core of expert TEI header creators with the explicit goal of sharing their expertise and co-ordinating (if possible) their practice.
Many of us who work in the expanding community of electronic text providers are well-aware of the potential usefulness of TEI Headers. However, in order to integrate electronic text collections into other resources (for example, catalogues of conventional paper-based library holdings), it is often necessary to map selected information from TEI Headers onto some other well-established resource cataloguing standard (such as MARC), or to an emerging de facto standard such as the Dublin Core element set.
Although standards such as MARC are more familiar to existing library cataloguers, they lack the extensibility and flexibility that creators of TEI Headers regularly use to describe and document an electronic document, its source, and the process of its creation and revision. Similarly, although the Dublin Core may prove to offer a reasonable mechanism for describing many of the resources available on the internet, it lacks the formalism and descriptive power that is available to creators of TEI Headers. However, it was never the intention of this meeting to concern itself with the relative merits (or otherwise) of standards such as MARC and Dublin Core, except perhaps with regard to the ease with which it would be possible to record and extract the data they require from within a TEI Header. Where such metadata standards clearly do converge with the process of creating TEI Headers, is in the area of the data themselves, e.g. the checking of an author's name against a suitable authority file so that the data, when extracted from the relevant element of a TEI Header, would satisfy the reqiurements of a typical MARC record.
For many projects, a translation-based approach dependent upon generating MARC records or Dublin Core metadata by (automatically) extracting appropriate information from TEI Headers appears to be merely an interim solution, until it becomes clear how to interface directly, say, a database of TEI-conformant texts, with a Z39.50 compliant client/server system. In addition, there are also a growing number of cases where the goal is to integrate a number of TEI-conformant text collections, but this work is hampered by the fact that different practices have been adopted when creating valid TEI Headers.
The Oxford Text Archive (OTA), established by Lou Burnard in 1976, has one of the world's largest collections of TEI-conformant electronic texts. Over the long life of the Archive, five different "flavours" of TEI-conformant Headers have been used to document electronic resources to say nothing of the many texts in its holdings for which no or very little metadata of any kind is available. Following its recent appointment as a Service Provider for the UK-based Arts and Humanities Data Service (AHDS), the OTA is now required to standardize its current practice relating to the creation of TEI Headers as a means of integrating the Archive's holdings with those of the four other AHDS Service Providers. (More detailed information on this topic is provided in the paper by Alan Morrison and Jakob Fix, which will also be given at this conference). The Oxford Text Archive is also keen to strengthen its associations with other electronic text centres worldwide, not least to provide a better service for users of the AHDS, by agreeing to the mutual exchange of TEI-conformant texts for integration into our respective collections.
In light of the fact that the Oxford Text Archive was in the process of reviewing its own policies and practices with regard to the creation of TEI Headers, and with a mind to the forthcoming TEI Tenth Anniversary User Conference, we decided to invite representatives from a number of electronic text centres and text creation projects, to a dedicated TEI Header meeting. The attendees were: John Price-Wilkin (Humanities Text Initiative, Michigan), David Seaman (Electronic Text Center, Virginia), Perry Willett (Indiana Library Electronic Text Resources), Laurent Romary (Silfide, INRIA), Julia Flanders (WWP, Brown), Richard Gartner (Bodleian Library, Oxford), Michael Sperberg-McQueen (TEI, UIC), Peter Flynn (CELT, University College Cork), and Nick Finke (CETL, University of Cincinnati College of Law).
The main objective of the meeting was to make some progress towards facilitating the interchange of TEI Headers (as a minimum), between some of the major producers and distributors of scholarly, electronic, TEI-conformant texts. Prior to the meeting a number of steps by which such progress could be achieved were identified, and these are listed below:
This paper will present a summary of the findings of that meeting, reporting on the degree of consensus achieved, and any major problem areas identified.
Each of the participants was asked to provide the following:
We hoped that this approach would prove to be a simple but effective mechanism for gathering feedback from the participants, regarding the relative merits and usefulness of the various elements available in the TEI Header. It also provided a rudimentary indication of whether or not any consensus exists amongst the participants, and helped to identify any elements in the TEI Header for which usage is widely divergent.
The one thing that this meeting did not set out to do was attempt a review the TEI Header as a whole. Whilst there is every hope that the outcome of the meeting might prove useful to any future review of TEI P3, and to a reconsideration of the TEI Header in particular, the constitution of this group of invitees was not sufficiently broad to regard this as a possible objective.
In some respects, the worse possible outcome of this meeting would have been for no consensus to emerge. The expressive power and flexibility of the TEI Header to enable the encoding of an immense range of metadata within a controlled framework, is simultaneously both its greatest strength and greatest weakness in so far as it provides users with an extremely powerful gun with which to shoot themselves in the foot. So it was pleasantly surprising to discover just how much consensus there was amongst the participants, with much of the discussion focusing on best practice with regard to the possible data content of particular header elements, rather than on the use of those elements per se.
If the range and complexity of TEI Headers produced at the large (and growing) number of electronic text creation centres continues to expand, then this perhaps represents an inhibiting factor to the simple interchange of Header information. The ability of the TEI Header to encode metadata is beyond question, but the value of this information is perhaps somewhat diminished if it generally has to be thrown away in order to facilitate the interchange and sharing of electronic texts. For example, if the participants at the meeting had only identified a small set of common mandatory elements, this would facilitate the interchange of electronic texts at the expense of recipients losing a great deal of potentially useful metadata. Moreover, if the participants at the meeting were unable to agree upon sets of required, recommended, and optional TEI Header elements, this would appear to suggest that some sort of intervention is always likely to be required if one wishes to integrate two or more collections of electronic texts produced by different projects. Whilst this is not, of itself, a barrier to the interchange of electronic texts, it might be felt to be an additional discouragement because of the cost implications involved.
Fortunately, the participants at the meeting were able to reach a consensus, which should constitute valuable information for the rest of the TEI user community. A draft report on the meeting has already been circulated to all the attendees, and will be made widely available as soon as possible, to be followed, if approved, by the proposed "Guide to Good Practice". Of more long-lasting value, the community will have all the advantages made possible by the greater interchange of TEI Header information between electronic text creators and providers.
At the time of writing, any future work plan remains somewhat speculative. It is easy to envisage how an emerging consensus with regard to the creation and content of TEI Headers would facilitate the training of users in this hitherto rather difficult and specialized area (with the proviso, of course, that the TEI Header should not be seen to be limited to whatever was agreed at this meeting). Similarly, if a model interchange TEI Header can be agreed upon, one might reasonably expect to see the rapid development of simple SGML transformation tools or scripts to map between the interchange form, and the TEI Header structure favoured locally by any particular project. The definition of such a minimal header (a header-Lite?) would also greatly speed and simplify the creation of effective metadata packages for use by the next generation of XML-aware web browsers. In the immediate short term, we would expect also to see the rapid creation and take up of simple tools effecting translations between the minimal TEI Header and other metadata schemes (e.g. MARC and Dublin Core).
Back to Technical Program