skip to main content
Indiana University Bloomington

Implementing TEI Collections with XTF in Victoria, B.C.

John Walsh

On October 28, 2006, SLIS faculty member John Walsh presented a poster and discussion on "Implementing TEI Collections with the Extensible Text Framework (XTF)" at the Text Encoding Initiative (TEI) Members Meeting on in Victoria, B.C., Canada.

At the meeting, Walsh was elected to a second two-year term on the TEI Technical Council, an international group of TEI/XML experts that works with the TEI editors to oversee the technical development of the TEI standard.

ABSTRACT - Implementing TEI Collections with XTF

XTF, or the eXtensible Text Framework, is an open source system, developed by the California Digital Library, for searching and retrieving electronic documents, including TEI/XML collections. The XTF system, which is being used in an increasing number of digital library and digital humanities projects, incorporates a number of widely used open source technologies, such as the Apache Lucene search engine and the Saxon8 XSLT 2.0 processor. Although XTF and its underlying search engine, Lucene, are not "native" XML systems, XTF nonetheless exploits the structure of a TEI/XML document and provides advanced searching of encoded text and metadata. Virtually every aspect of XTF can be configured by editing XSLT stylesheets; therefore, users intimidated by writing or editing Java code can still customize the system while staying in the XML-based realm of XSLT.

My poster will provide an overview of XTF and discuss the details of implementing a TEI/XML collection. The discussion will include information about:

  • organization of the system
  • software components and the relationships shared among the various servlets and tools
  • installation and configuration
  • preparation of TEI files for indexing
  • indexing of metadata elements
  • XSLT stylesheets that drive the system
  • possibilities for customization
  • example customizations
  • example projects using XTF

The Swinburne Archive, a long- standing digital humanities project that was recently migrated from a proprietary system to XTF, and a newer project, The Chymistry of Isaac Newton, will be used as illustrative examples.

XTF is a new and promising free open source solution for searching and publishing TEI/XML collections on the Web. It is based upon well- established and stable technologies, such as Apache Lucene, and provides a good balance between functionality and performance. This proposed poster will introduce the TEI community at the Members Meeting to XTF, a robust and easy-to-use system for sharing TEI collections.

Posted October 31, 2006