Text Encoding Initiative
Writers have been referring to other writings since the beginning of writing. It took place even in man's oral literary history though we cannot recover any of it. But in written works inspired by oral history, like Homer and the Bible, we can see great complexes of intertwined texts. The relating of texts to other texts and occurrences is also characteristic of electronic documents. However, the possibilities of using references in electronic texts are different from those in written texts. As well as the problems related to linking.
One important aspect of a reference is reliability: the reader should be able to find the source of the reference. The reliability of my references will decline because of mistakes and errors I make. If I quote a verse of text in my printed document, it may be corrupted because I didn't recheck the source. Further corruption may occur through a typing error or because my spell-checker modernised the spelling. Finally, the editor or printer may corrupt it. Each of these errors makes it more difficult for the reader to find the source of the document.
In electronic references, the problem is to create reliable links of various types. Quoting a verse can be done by a reference to the original, which will be fetched and inserted at the correct point in the text. The problems mentioned in the previous paragraph are not significant now, because I am not manually inserting the actual text of the quote. The quote itself will be generated based on the information I provide about its source. Therefore, the major issue is to establish the links. But even more important is maintaining them. The referred page might move or disappear causing my document to contain omissions, dead-ends, or even wrong quotes.
Maintenance is made more difficult when there are many versions of the document which is referred to, and when many people are using it and revising it as well. This dynamic aspect of electronic texts requires additional maintenance. The reader of an electronic text will experience broken references if maintenance is lacking, or not frequent enough. But sometimes, mending broken links is a difficult, if not impossible task. This task is even more difficult if the referred text exists on a machine located somewhere at the other side of the ocean.
This paper will focus on so-called independent links (ilinks). These are links whose definitions are located separately from the document in which their link-ends reside. Use of independent links is attractive, because they might provide solutions to current maintenance problems. Additionally, some of their problems will be described. And finally, a solution for maintaining reliable links in complex, evolving documents might be found in independent links. The crucial issues related to maintenance difficulties will be the focus. The central thesis of this paper is that independent links can improve the maintenance of a hypertext, and therefore its reliability over time.
There are many rival linking-systems available which allow an author to create electronic references. However, these linking systems only provide a way to specify a reference, not to maintain it. The following list shows four important specifications:
The first specification, TEI, and the second one, HyTime, are generic linking systems. They provide for both contextual and independent links for connecting two or more points in multiple directions. The main difference between the contextual and independent link is the location of the link. The latter separates the links from the actual text of the document, whereas the former is an integral part of the text. Therefore, TEI and HyTime, allow different webs to apply to a single, possible read-only source. An important feature, which adds hypertext functionality to a document, but does not change the original document in the process.
Traditional use of Uniform Resource Locators (URLs) allows for the well-known contextual, unidirectional links by means of the HTML anchor element. Historically, a reference made by using a URL is located in the referring document. URLs are designed to be resource specifiers. The major problem of URLs is the fact that one URL might identify different resources over time. This feature makes references based on URLs less reliable. Moreover, a document can be split in several smaller units to facilitate internet transfer. Each unit will have a unique URL, but as a result the complete document cannot be referred to.
Formal Public Identifiers (FPIs) are an alternative way to indentify resources. A resource can have multiple FPIs, but an FPI is always associated with one unique resource. To be able to locate a document referred to by an FPI one needs a catalog, which provides a mapping between the FPI and a system specific resource descriptor. This descriptor could be a URL. Using this scheme, references do not have to be modified if the physical location of the resource, and as a result its URL, changes. However, this scheme requires additional maintenance of the catalog.
Unfortunately, the final linking system mentioned, XLL, does not support FPIs. XLL is mostly based on TEI, HyTime and current HTML usage. Therefore, the resource identifier in XLL is a URL instead of an FPI. XLL will provide a means to make more advanced references available on the internet. Both the contextual link (in-line link in XLL terminology) and the independent link (out-of-line ilink) are catered for.
Independent links might implement a mechanism described more than half a century ago by Vannevar Bush, a device he called the Memex. He described the Memex as an electronic device, which allowed its user to store and read documents. A more interesting feature was the possibility to store references to documents read. It should also be possible to create links between several documents. Furthermore, if someone was interested in your research, you were able to give him your set of links and references, which Bush called a trail.
Bush's ideas can be found in the hypertext systems developed years later. The Internet can be seen as a more advanced version of his Memex: a Memex which connects to the Memexes of others. Many documents are available, and those documents can be found using indexes. Ideally, all these machines are continually available, which eliminates the necessity of maintaining personal physical copies. Unfortunately, the Internet is not as fast as his machine is supposed to be, nor as reliable. The trails Bush describes should not fade over time. However, if I were to save a path through Internet documents today, chances are that within the next couple of weeks, at least one of the references is broken. Therefore, important documents are often still saved locally, with a reference to the source of the document.
What do I need to do to mend the broken links in my personal web? The first, and probably most difficult, task is to find the document I was referring to. If I were lucky, and extremely careful, I would have made a local copy of the document I was referring to, just in case the author would move or remove it from the Internet. But whether or not the document can be found again, the original reference needs to be either changed or removed.
If an in-line link has been used, mending this broken link has some side effects. The actual document has to be changed, creating a new version of this document. As a result of this change, links to later points in my document might have been broken too. This domino effect occurs because if I remove a reference from a document, the actual structure of the document changes. Al references based on simple element counts, or byte offsets, will now point to a different part of my document.
However, if I use out-of-line links, I can update the web only, and leave the original document intact. This action only solves the initial problem, a broken link, without introducing other problems. Furthermore, I think this is the only theoretically just action. The document itself does not change, but one of the views on the document changes.
But HTML, the document format most commonly used on the Internet, does not allow independent links, only contextual ones. However, several new technologies prepare the Internet for independent links.
Do we need out-of-line links on the Internet? The previous section gives one argument in favour of independent links: it would facilitate mending broken references. There are other important advantages though. Independent links will allow links between two or more end-points. These links can either be uni-directional, bi-directional or even multi-directional. Additionally, independent links allow users to create new links from documents they cannot change themselves, or which are in a format which does not provide for embedded links (e.g. an image or video).
To use independent links on the Internet, several programs need to work together. On the server-side, the required part of the document needs to be extracted: if the link specifies an element, or a span of elements, this part should be returned only. On the client-side, however, access to the complete structure of the document is necessary. Therefore the server has to return information about the context of the fragment returned, and the identity of the originating document. This is information is required to solve relative links.
Furthermore, the sets of independent links need to be collected both from the Internet, and from the local system. Not all available link sets should be activated, though. An important document may have many different link sets acting on it. Each of them will provide a different perspective on the document. A user can activate the webs which he needs for his research. This cannot be done using in-line links. If I would like to study one of Shakespeare's plays, without ilinks, the information would be too dense to be of any use. Furthermore, the original document would be cluttered with references, whereas out-of-line links keep the source document clean.
Finally, the collected webs need to be merged, and applied to the document being displayed. The merging process should keep track of which web generated a specific link. This information is very useful to the user, because it may inform him about the type of the link. Two different webs on a Shakespearean play might be a set of links to Biblical references, and a set of links to Ancient Greek references. Seeing which set generated a specific link, will aid the user in determining whether to follow a link or not.
Another important aspect when using independent links is the transmission of fragments of electronic documents. To provide for faster access to Internet pages, most large articles are split into short physical documents. For the links to work on both the full version, and the split version, a browser needs information about the complete structure. For instance, if I am sending my document by chapters, I need to know at the browser side, what chapter I am currently reading. This is necessary, because a link could apply to the third chapter of my book. The transmission of fragments for which a standard has been developed at SGML Open, will provide for this partial document transfer necessary for the Internet. At the same time, it will preserve all information about the structure of the fragment and the identity of its source.
Choosing an independent solution for representing links, both the author and the reader can add references to and from documents located anywhere on the Internet. These documents need not be changed to facilitate this extra functionality. Furthermore, the location of the documents and the webs can be anywhere on the Internet. Once the required webs are loaded, full linking functionality is available between the documents.
But how does the scheme described in the previous section improve reliability over time? By removing most contextual links, the document will be more general. Several views on the document can be merged, by just adding several webs at once, without making the document look cluttered to the casual reader. This also removes the need of maintaining different versions of a document for different audiences, and might even add possibilities to a document by allowing different views to be open at once. Most importantly, whatever new cross-references are added, the basic structure of the document will not change, and neither will its version number. Not changing the base document to add or modify links, but only changing its webs, which are stored separately, will protect every link made to the document, and thus facilitate maintenance of all documents related to this document.
One important issue is whether it is fair to treat the document as unchanged, because only one of its webs changed? I think it is. The purpose of the document itself is to convey some information to its reader. That current electronic documents allow the author to provide the reader with references, which can be resolved instantly, is a useful feature of this document. However, independent links provide for an overlay of links on the document, which provide additional information. This should be kept separate from the original document, and be maintained separately. The web itself is an informational unit, with its own message, and therefore it should have its own version. The documents, which are referred to, should not experience modifications if, for whatever reason, this separate, informational unit is to change.
One could argue that treating the set of links as another separate unit, adds complexity, and thus maintenance overhead to an electronic document. Furthermore, if contextual links are broken, at least one end will always be present. This loose end might help the author in finding the lost link-end, because the context is clear. If independent links are used, this context might not be clear, and thus make recovery more difficult.
As a reply to the first argument, I think that structuring one's documents into functional parts will only enhance the author's comprehension of the project. References are not equivalent to the original text, and should not be treated as such. Of course, not every reference in a document calls for an independent solution. If I return to the quote example of the introduction, the use of an independent link is not necessary. If I would like to include a piece of text from another document at a certain point in the text, the reference will become an integral part of my text. However, if I merely allude to the quote, I would like the quote to be available to the reader. This is an added functionality of my electronic document, and thus requires an independent solution.
I do not consider the lack of context of independent links a real disadvantage, because meta-information should be part of the web. This information will provide the author with the crucial information about the purpose of each web, and thus improve readability if changes need to be made, or broken links fixed.
Therefore, I think making the Internet ready for independent links is important. It will provide hypertext authors with more functional tools to convey their message. Furthermore, separating text and webs, which are informational units on their own, will make the hypertext easier to understand, and easier to maintain. Even maintenance of documents, which will refer to the created hypertext, will be positively influenced. This makes independent links an important tool in making hypertext on the Internet more reliable.
Berners-Lee, T. L., Masinter M. McCahill. "Uniform Resource Locators (URL) RFC 1738," W3C December 1994. [3.1]
Bray, Tim, Steve DeRose. "Extensible Markup Language (XML) W3C Working Draft 07-Aug-97," W3C 7 August 1997. [3.2] [3.3]
Bray, Tim, Jean Paoli, C. M. Sperberg-McQueen. "Extensible Markup Language (XML): Part 2. Linking W3C Working Draft April-06-97," W3C 6 April 1997. [3.3]
Bush, Vannevar. "As We May Think The Atlantic Monthly," 176:1 101-108. July, 1945. 0004-6795.
DeRose, Steve, Paul Grosso. "Fragment Interchange SGML Open Technical Resolution 9601:1996," Chicago, Oxford. SGML Open 1996 November 7 9601:1996. [3.4]
DeRose, Steven J., David G. Durand. Making Hypermedia Work. A User's Guide to HyTime, Boston/Dordrecht/London Kluwer Academic Publishers. 1994. 0-7923-9432-1
Sperberg-McQueen, C.M., Lou Burnard. Guidelines for Electronic Text encoding and Interchange TEI P3, Chicago, Oxford Text Encoding Initiative April 9, 1994.
W3C Activity: SGML, XML, and Structured Document Interchange last accessed at 19 October 1997 from http://www.w3.org/XML/Activity
*I would like to thank Dr. Steven J. DeRose (INSO Corporation) and Dr. Harry E. Gaylord (Groningen University) for their valuable feedback on previous versions of this paper.
3.1. Internet document retrieved at 19 October 1997 from http://www.w3.org/adressing/rfc1738.txt
3.2.Internet document retrieved at 19 October 1997 from http://www.w3.org/TR/WD-xml-lang.html
3.3. Internet document retrieved at 19 October 1997 from http://www.w3.org/TR/WD-xml-link-970406.html
3.4. Internet document retrieved at 19 October 1997 from http://www.sgmlopen.org/sgml/docs/a601.htm
Back to Technical Program