Encoding Overview

The online collection of nearly 3,000 volumes consists of two different groups of texts. The larger group of approximately 1,800 electronic texts was created by Prime Recognition Optical Character Recognition (OCR) software. These texts are minimally encoded and largely unedited, and rely on the facsimile page images as the main access point. The other group of approximately 1,200 texts has been fully edited and encoded, and also includes facsimile page images. In addition to being corrected, these files allow for better document-centric navigation by identifying chapter or story divisions within each work and having a hypertext linked "Table of Contents." Both groups of texts are available for bibliographic and full-text searching as well as browsing.

Electronic texts in this project were originally encoded by AEL Data, Pacific Data Conversion Corp (now SPI Content Sciences), and Techbooks in the Standard Generalized Markup Language (SGML) following the Text Encoding Initiative (TEI) Guidelines, version P3, using the TEI Lite DTD (version 1.6). In an effort to bring the encoding up to date, the original SGML, TEI P3 files were transformed to XML, TEI P4. Aspects of the encoding that were not conducive to automatic mapping and transformation were updated manually. In 2012, the TEI P4 version of the files were transformed yet again to the most current version of the TEI P5, which also required manual intervention to address aspects lost in translation, and also to conform more closely with the Best Practices for TEI in Libraries. The texts now rely on a custom TEI P5 W3C schema.