-
Notifications
You must be signed in to change notification settings - Fork 1
/
README
70 lines (54 loc) · 2.64 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
########################################
# #
# wikicaptcha #
# #
########################################
wikicaptcha is an attemp to build a ReCAPTCHA-like[1], CAPTCHA system[2] to
help correcting errors found in digitalization and collaborative transcriptions
done on Wikisource[3], a Wikipedia's sister project.
Wikisource aims to build a free (as in freedom) digital library, books in
Wikisource are released with a Creative Commons BY-SA license or are otherwise
available to redistribute, copy, modify (e.g books which are in the Public
Domain)
== History ==
The history of the idea of using CAPTCHAs (and ReCAPTCHA in particular) on
Wikipedia to aid digitization of books is a long one, as testified by its
insertion among the "Perennial proposals" page[4].
However, in the case of Wikisource, the presence of a considerable amount of
digitized books led some Wikisources in the Italian language community[5]
to wonder if it was possible to use a similar system to help in the steps which
characterize the insertion of a book in Wikisource, expecially in the
proofreading phase.
An user of it.wikisource, Alex brollo[6] discovered that OCRed Djvu files
contained a text layer which could be extracted and analyzed. The characters
for which the OCR was dubious were marked with a caret ("^").
The idea is to set a system which exploits these features to build a ReCAPTCHA-
like system to help correct for these dubious recognitions, as first proposted
here[7].
== Sintax & options ==
Usage: wikicaptcha [-h] [-v] [-p PAGE] [--debug] infile
positional arguments:
infile the input (.djvu) file
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-p PAGE, --page PAGE first page to process
--debug turn on debug messages
== Examples ==
In the 'test' directory you'll find a couple of djvu suitable to make test with
wikicaptcha. Try, for example:
./wikicaptcha test/Horse.djvu
== Authors ==
See AUTHORS file for details.
== License ==
See COPYING file for details.
== Further reading ==
* http://lists.wikimedia.org/pipermail/wikisource-l/2011-February/000939.html
== References ==
[1]http://en.wikipedia.org/wiki/reCAPTCHA
[2]http://en.wikipedia.org/wiki/Captcha
[3]http://wikisource.org/wiki/Main_Page
[4]http://en.wikipedia.org/wiki/Wikipedia:Perennial_proposals#Use_reCAPTCHA
[5]http://it.wikisource.org/wiki/Pagina_principale
[6]http://it.wikisource.org/wiki/Utente:Alex_brollo
[7]http://lists.wikimedia.org/pipermail/wikitech-l/2011-November/056078.html