Project Information

When the project was conceived in 2000, the Wright American Fiction collection from this 25-year period presented a significant opportunity for a cooperative project due to its size and relationship with other large text collections at the time including Chadwyck-Healey's Early American Fiction, which ends at 1850, and the University of Michigan and Cornell's Making of America, which covers 1850–77, but at the time excluded fiction.

The project was originally developed in phases as follows:

Phase One

Digitize page images. Each frame from the microfilm set, most of which contain two pages, was digitized by Preservation Resources. The conversion output resulted in almost 400,000 bitonal TIFF image files, or over 750,000 pages. The following CIC Libraries contributed to this phase:

  • Indiana University
  • Michigan State University
  • Ohio State University
  • University of Illinois, Chicago
  • University of Illinois, Urbana-Champaign
  • University of Iowa
  • University of Michigan
  • University of Minnesota
  • University of Wisconsin

Phase Two

Convert to Text Files. Each TIFF file is processed by Optical Character Recognition (OCR) software, from Prime Recognition. This work was conducted at the Digital Library Program offices at Indiana University.

Phase Three

Edit and Encode the Text Files. The OCR process is seldom perfect; there are many errors in the text files that need to be corrected. Three universities have committed to contributing $17,000 per year for 3 years to undergo editing and encoding:

  • Indiana University
  • University of Iowa
  • University of Michigan

In addition, 8 other CIC libraries have contributed toward this endeavor:

  • University of Chicago
  • University of Illinois, Urbana-Champaign
  • Michigan State University
  • University of Minnesota
  • Northwestern University
  • Ohio State University
  • Pennsylvania State University
  • University of Wisconsin

Phase Four

Create Search and Display Functionality. Indiana University originally licensed the Digital Library Extension Service software from the University of Michigan for this project. A considerable amount of modification was necessary for the demands of this project.

Migration Phase

In 2012, the Indiana University Libraries began the migration of the Wright American Fiction project. Due to limited resources, functionality, facsimile page image, and text encoding improvements were not actively sought except for those original files that did not include the full text. Optical Character Recognition software was run against the existing facsimile page images to generate uncorrected OCR. The Wright corpus is now full-text searchable in its entirety, comprised of edited, mid-level encoded texts and unedited, minimally encoded texts. For more information about the current state of the encoding and publishing platform, consult the Encoding Overview and Technical Implementation Overview.