next up previous
Next: 3 Annotation Up: 2 Why this text? Previous: 2.3 Careful translation

2.4 Standard structure and verse alignment

One of the difficulties with parallel corpora is that most often they are not explicitly aligned -- for example, a considerable amount of work has been done attempting to automatically align the Canadian Hansards, parliamentary proceedings in English and French, at the sentence (and sometimes word) level.gif Because the Bible's structure is fully standardized in terms of books, chapters, and verses, alignment at the verse level comes essentially for free, and in fact the main aim of this project is to represent that standardized structure in a consistent format.

Within verses, of course, there is considerable variation in translations, and so research requiring finer-grained alignments, e.g. word-aligned parallel corpora, will still require further work. However, the consistent verse-level alignments provide appropriate training material for algorithms that learn to do lower-level alignments on the basis of correctly aligned text, e.g. [Melamed1996a], and can also be used as a source of test material for algorithms that attempt to produce sentence-level alignments.

The structure inherent in the Bible also eliminates some problems of omissions in parallel text. Later translations eliminate, relocate, or footnote passages found in the King James Version (as well as its contemporaries and descendants). For example, the last part of Mark 16:8 and John 7:53-8:11 are contested: both are attested in the plurality of manuscripts, but not in the oldest texts, discovered since the KJV translation. Verse alignment limits the impact of such omissions, as such cases would result simply in null pairings. (See [Melamed1996c] for discussion of automatic methods for detecting omissions in translations.)


next up previous
Next: 3 Annotation Up: 2 Why this text? Previous: 2.3 Careful translation

Philip Resnik
Tue Oct 21 19:23:13 EDT 1997