|
The National Institute of Standards and Technology, US Department
of Commerce, has awarded the Scholarly Technology Group a one year grant to
continue STG's ongoing research and development in the area of
Open Electronic Book
standards.
The specific purpose of this project is to support STG's investigation into
the problem of "semantic heterogeneity" across Extensible Markup Language
(XML) Document Type Definitions (DTDs) and XML schemas, focusing
particularly on schemas for "extended" OEB electronic books.
Elements and attributes in different XML document schemas typically have
related -- but also possibly different -- "meanings". At the
present time however there is no way at all to represent
these relationships in a formal machine-processible way.
That is, there is no way to say, for instance, that two
element types from different schemas are exact equivalents
(e.g., <xx:h1> and <yy:heading1>), or that a
combination of two or more elements from one schema is
a more detailed treatment of a feature represented by one
element in another schema (e.g., <xx:firstname> and
<xx:lastname>, vs. <yy:name>), or that an
element from one schema is more specific than an element in another (e.g.,
<xx:person_name> vs. <yy:name>. Even more
challenging, we have no way to say that two elements have
some sort of partial equivalence, i.e. that they are similar
in certain respects although perhaps neither
identical nor related in any of the specific ways
described above (compare <xx:chapter> and
<yy:division> for instance).
These problems currently present very serious practical obstacles to
high-performance interoperable electronic publishing in general, and
eBook publishing in particular. The
successful development of tools and techniques -- for
information retrieval, navigation, viewing, rendering -- all
require some common representation of data for their
effective use. Without a common representation across
diverse XML schemas software developers are forced to make an
unfortunate choice between either low functionality on the
one hand, or low interoperability on the other. This is
particularly troubling given the increasing importance of
diverse specialized and domain-specific XML schemas used in
extended OEB electronic books.
As noted above there is currently no principled way to compare and relate XML
schemas and their component elements and attributes, and,
therefore, no way to express their semantic relationships in
a formal machine-processible language. Ultimately the
issues here are very deep: we have no good theory of "markup
semantics" and even our empirical knowledge of actual
practices is slight.
Nevertheless some useful practical measures should be within reach.
This particular grant funds the addition of data collection features to STG's
XHub project and begin working with our industry partners to
systematically study the problems of semantic heterogeneity
raised by XML documents, and to develop and apply technology
for addressing this problem. XHub provides a publicly
available infrastructure for systematically converting
documents among diverse tag-sets. Instrumentation for
collecting and analyzing data about the structural nature of
the syntactic transformations involved in these conversions
will be a valuable source of practical insights into
perceived semantic relationships, and, in addition, help
create a testbed for evaluating emerging methodologies for
managing semantic interoperability.
Lead investigators are Allen
Renear and Steve
DeRose. For more information contact Allen Renear
(Allen_Renear@Brown.Edu or 401 863-7312).
|