Parallel corpora are increasingly of interest in natural language processing, with applications in cross-language information retrieval [Hull and Oard1997], machine translation (e.g. [Brown et al.1990]), in approaches to word sense disambiguation [Brown et al.1991], and in computational lexicography [Melamed1996b] . However, corpora reliably aligned at the word or even the sentence level are difficult to obtain even for commonly found language pairs, and for ``low density'' languages - those for which few resources exist - parallel corpora are even mopre difficult to find.
The Bible is an interesting alternative to investigate: as discussed above, it can potentially yield a multi-way parallel corpus with representation from every language family, with the content carefully translated and nearly sentence-level alignment included. Although it is not the largest of corpora, parallel corpora of significantly smaller size have yielded useful results, e.g. [Resnik and Melamed1997], and although its content is more specialized than, say, contemporary newspaper text, it does cover a very wide range of linguistic phenomena and domains of world knowledge; for example, see the range of conceptual categories in the Louw-Nida thesaurus for the New Testament [Louw and Nida1989].
We plan to investigate parallel versions of the Bible as a possible resource for bootstrapping natural language resources, especially for work in machine translation, first by applying the techniques described by Resnik and Melamed for extracting and assessing word correspondences, and then using techniques for identifying multi-word units [Melamed1997]. We also plan to evaluate the coverage of the Bible with respect to vocabulary and conceptual content by comparing it with existing lexicons for interlingual machine translation [DorrTo appear, Dorr and Olsen1997] and thesauri such as WordNet [Miller1990].
In particular, we would like to compare the translation lexicons we create to bilingual lexicons automatically acquired by other means. (We have available to us Spanish-English and Arabic-English, with Korean-English in process, with other projects planned [DorrTo appear]). We would like to investigate (i) the extent to which the biblically-based lexicons could be considered a ``core'' or ``seed'' lexicons, and (ii) what would be needed (in terms of coverage and resources) to scale up the biblically-based lexicons.