Install

Kragen Sitaker did amazing work back in 2005/2006 'liberating' the OED first edition which is now (mostly) in the public domain and he posted up fairly good scans of volumes 1-6 on archive.org (see 1, 2).

The home pages for the six volumes are:

However at the time he was unable to do much on the OCR front (no doubt because of the poor performance of open source OCR, particularly on such a complex text as the OED which has lots of non-standard English and font changes). With the better open source OCR engine it would be possible to convert the OED back into text and perhaps wikify it to allow for gradual proof-editing and correction.

Install

Download source pdfs

mkdir cache/pdf cd cache/pdf wget http://www.archive.org/download/oed01arch/oed01arch.pdf

Note jp2 is also available: http://www.archive.org/download/oed01arch/oed01arch_jp2.tar
Install requirements.
- tesseract for doing ocr. On Debian/Ubuntu::
```
 apt-get install tesseract-ocr
```
- python and pypdf
Run the script:

python oed.py

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
output		output
.gitattributes		.gitattributes
3column.css		3column.css
README.md		README.md
abbyy2hocr.xsl		abbyy2hocr.xsl
notes.txt		notes.txt
oed.py		oed.py
oedabby.py		oedabby.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Install

About

Releases

Packages

Languages

tfmorris/oed

Folders and files

Latest commit

History

Repository files navigation

Install

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages