Encoding Overview

Indiana Authors and Their Books (Indiana Authors) is a project funded by a grant from the Library State Technology Act. The project is based on the digitization and encoding of the 3–volume reference work, Indiana Authors and Their Books, published by Wabash College in 1949, 1974, and 1981. The encyclopedia covers nearly two hundred years of Indiana's literary history (1816–1980), and contains approximately 7,000 author entries. Each of the author entries contains a corresponding bibliography collectively referencing close to 21,000 citations.

Of the 21,000 titles listed in Indiana Authors and Their Books, approximately 150 monographs were identified as representative of Indiana's Golden Age of Literature (1880–1920). Since its original conception, the project grew in scope as a test–bed for "productionizing" electronic text workflows in partnership with the Indiana University Bloomington Libraries Technical Services and Arts and Humanities departments. As a result, another 250 monographs published before 1923 were selected for electronic conversion.

The approximately 400 monographs that are currently part of the Indiana Authors project have been encoded in Extensible Markup Language (XML) following the Text Encoding Initiative (TEI) Guidelines, TEI Lite version P4, at a Level 3 as described in Best Practices for TEI in Libraries.1 Our most recent encoding workflow relies on full text generation with optical character recognition (OCR) software. The text is then automatically ported into a TEI template that contains a pre–populated header with bibliographic metadata from MARC records and other boilerplate metadata and page breaks. The "shell" TEI is then ready for additional structural markup by the Technical Services team.

The 3 volumes that comprise the encyclopedia entitled same as the project web site, Indiana Authors and Their Books, are encoded following the Text Encoding Initiative (TEI) Guidelines, TEI P4, at a Level 3 as described in Best Practices for TEI in Libraries. The encyclopedia is organized by author entries, each containing a biographical sketch and bibliography. Selected monographs from the bibliographies that have been encoded are linked from the titles to the encoded monographs via an author entry identification scheme. Because a tiny fraction of cited books have been encoded, title searches are passed to several external services like Google Books, HathiTrust, OCLC WorldCat, and our local catalog, IUCAT, to facilitate access.


1 The encoding guidelines for this project were established many years before the most recent version of The Best Practices for TEI in Libraries, which was released in October 2012. The encoding for both the books and the encyclopedia loosely follow a level 3 encoding approach.