eMOP is funded by a grant from the Andrew W. Mellon Foundation, and as such all code produced by and for eMOP is available via an Apache licence 2.0.
The code that implements the entire eMOP workflow.
The online dashboard that powers the eMOP workflow.
A tool created for eMOP that allows users to create training for Tesseract with their own typeface samples.
- hOCR deNoising
A tool created for eMOP post-processing that removes noise from Tesseract's hOCR output.
A command line version of Juxta that compares OCR output to groundtruth files.
- Page Corrector
A tool created for eMOP that uses dictionary files and a google 3-gram DB to correct Tesseract output.
- Page Evaluator
A tool created for eMOP that evaluates OCR output to determine how correctable it is.
A tool created for eMOP that compares OCR output to groundtruth files.
A collection of training created for Tesseract by eMOP using Franken+.
- Publishing Imprint DB
Printer, Seller, and location information culled from the imprint lines of the entire eMOP dataset. These XML files (EEBO and ECCO separately) contain only those entries for which we have an ESTC number.
A robust image comparison environment, presenting versions of texts in filmstrip view along side each other and collating these images of different texts while allowing users to adjust the collation.
- Aletheia Web Layout Editor
A tool for identifying and transcribing paratext on a page image in TypeWright
Copyright 2014 Initiative for Digital Humanities, Media, and Culture at Texas A&M University
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.