Annotating individual language versions of the
66 books of the Bible (or, in some cases, the New Testament) requires
only a simple 3-level hierarchy of text elements (book, chapter,
verse). In our initial pass through the annotation process (see
below), we are labeling elements as b (book), c (chapter),
and v (verse), producing an intermediate representation that
captures the major structural levels without conforming to any
particular DTD. The following examples show a single verse, Matthew
1:7, in 9 languages:
In all these cases, the intermediate encoding for book and chapter elements are identical:
<b id="MAT"> <c id="MAT:1"> ... </c> </b>The labels (id attributes) for elements make it possible to identify verses in a context-independent way by including the book and chapter in the label, e.g. ``GEN:1:1'' for Genesis, chapter 1, verse 1. This will allow users to take advantage of simple tools such as Unix 'grep' for simple day-to-day manipulation (for example, needing to look up a particular verse) while also being able to utilize more powerful SGML-based tools.