Text Encoding Initiative
Tenth Anniversary User Conference


TEI and XML

Steven J. DeRose
Inso Corporation
[sjd@eps.inso.com]

SGML (the syntactic basis for the TEI) is clearly the most useful system for very general document processing available today, and the TEI is generally acknowledged to be the most fully worked out application of SGML to the needs of academic projects, far better for research purposes than (say) HTML.

On the other hand, the ubiquity of HTML leaves TEI users in the position of having to translate into HTML to deliver texts. This is more useful than not distributing texts at all, but imposes serious limitations on all kinds of processing and use.

Similar problems have also been encountered by others with data that doesn't fit HTML well. Fortunately XML, the Extensible Markup Language promises to bring the Web a level of functionality that will overcome much of this problem. XML, a project of the World Wide Web Consortium (W3C) is in many ways a computing humanist's dream: it creates a stripped-down version of SGML suitable for use on the Web: XML is simpler to parse than SGML, enabling lightweight, inexpensive software of all kinds. At the same time, XML documents are fully SGML compatible: all valid XML documents are also valid SGML documents, and can be processed by conventional SGML systems. The reverse, of course, is not true, since if it were then there would be no net simplification.

This talk with discuss the background and design characteristics of XML, including its related parts XLL and XSL. It will also discuss some implications of XML for TEI users and for the TEI itself, and some common threads linking XML to the TEI, including many specific ways in which TEI work has influenced XML.


Back to Technical Program