forked from phanikishoreg/gocr
-
Notifications
You must be signed in to change notification settings - Fork 1
/
HISTORY
275 lines (257 loc) · 13.9 KB
/
HISTORY
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
History: (Changes,ChangeLog)
0.51 in progress
0.50 Sep10-May12,Mar13
just release it to avoid questiions to old problems, give a life sign ;)
fix 4 parfait problems against 0.48 (thanks to Rich Burridge)
adding qrcode detection and decoding (no error correction, no skewing)
spacing slightly improved
context correction of hex codes (p.e. hex fingerprints)
some threshold value adaptions (not finished)
try to fix double output of XML code <...> and removed additional \n
improved quotation detection ,, ''
improved monospaced spacing (video text)
0.49 Aug09-Sep10
fix dot handling for ':' and ';' (vector code)
fix '@' for 7x9 and 5x8 fonts
fix double counting of subboxes (affects "0" (zero) with dot in it)
character "l" of width 1 improved
bug fix gluing chars ij of width=1
bug fix thresholding (small gray images)
return error code -1 on ERROR pnm.c unexpected EOF
fix conflicts with unicode_defs.h TRUE definition on gcc/alphaev7-osf/3.4
further fixes for lib by D. Katsubo
fix #3039007 "struct list" in list.h conflicts with STL (ocr_object_list)
fix #3039006 INFINITY macro in unicode.h conflicts with math.h
bugfix barcode 128, switch from mode mC to mA (":1")
bugfix: MultiPNM + database - ID: 2957140
improved barcode recognition - ID: 2859644 (bars wider than spaces)
quality test-script bin/gocr_chk.sh added
initial datamatrix support (ASCII + ASCII numeric only, no ErrCorrection)
0.48 Jul09
fix buffer overflow introduced in 0.46 for filenames
add codabar barcode
fix bug, removing melted serifs
add patch by Chris Lee, i25 barcode recognition + modifications
fix some false positive numbers "34" (video, gas meter)
fix problems with 2zZ4 for 10x10 screen font
better debug output for :;,.
remove examples, doc and libs part from configure (see below)
remove doc and examples from the (make install) part to reduce
dependencies (gs and transfig is not needed for rpm/ebuild)
gocr only may depend from netpbm, but can live without too
this will help to install gocr on "exotic" (nonlinux) platforms
fix gentoo app-text/gocr Bug 243250 src/Makefile: $(CC) $(LDFLAGS) ...
0.47 fix database recognition for certainty 100 (-a 100)
insert spaces with certainty 100 (old: 99) to let -a 100 work
new option -u string for unrecognized chars
fix: No contrast in image causes division by zero
reduced false positive recognition of scanned "a496" (Gutenberg Project)
"d as a" patch ID: 1556112
add "Windows Pipe Fix", but I hate extra code for bad environments
improve 7x10, sample 0811qemu1.png (ToDo: not finished)
change black:white from >4:1 to >3.5:1 as criteria of inversion
reintroduce static library libPgm2asc.a (make libs) for OSRA project
add dynamic library (make libs), unused but may help other projects
0.46 improved context correction (especially helvetica "Il")
improved recognition of tiny chars "$1", fat "s", "rw" ","
fix blank spaces problem in filenames
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=316511)
!!! please check on other platforms and report to me !!!
there are still problems with special chars (double quodes, backslash)
better use this way: djpeg -gray -pnm strangefilename.jpg | gocr -
fix possible problem with database and UTF8 input
fix hidden bug in pitch/spacing initialization
reactivate code for output of glued chars and strings
fix wrong close() call
remove creation of pgm2asc.a for simplicity (see SF-patch 1827477)
0.45 minor corrections for c and k
minus sign is filtered by option -C "--" now, ("\-" was parsed badly)
clean up old unused code for simplicity (api, frontend)
fix problem with low height barcodes and barcode removing
fix problem with readpgm (for multiple images) and database
PACKAGE_VERSION defined by configure.in AC_INIT + gocr.spec
0.44 add volume to boxes (negative means white areas inside black areas)
Fix overflow in despeckling routine (verbose mode, dust removing)
reactivate composed chars, fix merge_boxes
fix problems with uncertain line detection and not recognized "7"
option -a has an effect now for the output
adaptions to MICR E13-B font (see GnuMICR), ToDo: 4 extra-chars
fix num_boxes in merge_boxes (affects line detection)
reduce 2 prompts to one per char in database mode, ^A for skip all
fix problem with smaller headlines
fix problems with tall font (4)
fix includes for non-linux-platforms
0.43 fix problem with dark frame around image
support multiple images, ex: giftopnm -image=all a.gif | gocr -
invert if obviously white on black (black_mass>=4*white_mass)
improve thresholding for discrete histograms
(note: this can particularly lead to bad results, will be fixed later)
speedup for big boxes (especially dark background)
fix memory leak (setas(same string) + detect_barcode)
fix uninitialized variables after insert spaces (num_frames)
fix frame_vector for single pixels (twice + ERROR idx out of range)
0.42 further parts of recognition engine relaced by vector version
changed colored debug output for out??.png
division of glued chars replaced (slower but more accurate)
fix framing of small font
fix problem with uninitialized pnm_readpaminit call (CPS 21Nov06)
better progress output (see progress.[ch]), new image debug output
switch to the new improved rotation detection
0.41 (buggy if --with-netpbm=no, apply the pgm-patch!)
otsu.c concentrates now only on high contrast regions
fix pnm reads for 2 byte pixels (--with-libpbm=no)
update man-page (mail me your suggestions)
fix g++ warnings, float-OPs replaced by int-OPs
spacing reviewed; make distance() more sensitive
xml-objects (barcode, melted chars) now also handled with weights
fix division by zero bug for vertical positioned characters
default output is UTF8 now, UTF-encoding bug fixed
added certainty option
added uninstall to Makefile
debug image format changed to png (using pipe) or ppm (fall-back)
much better word spacing (line-by-line based)
better DOT_ABOVE recognition
fix output of char groups or strings stored in database, utf8 input
fix buffer overflow in barcode decode39
fix lost comma on end of line
internal vector format added for future use (faster, scalable, rotable)
line detection extended
internal list management rewritten to fix memory leaks and segfaults
0.40 update PNM file reader to maxval > 255
(make rpm) updated
barcode-patch UPC_addon by Michael van Rooyen
CAPITAL_LETTER_A_WITH_OGONEK added
no "(PICTURE)" output for UTF8+ASCII (better for Mobile OCR project)
smooth_borders() bug fixed and reworked
5x7 and prop10 font adaptions
objects now detected by flood-fill algorithm (better?)
XML-output changed
changed auto dust detection (not final)
0.39 XML output added (subject of change, suggestions are welcome)
netpbm-link-error fixed in gocr.c and configure.in:
gocr.c: <config.h> changed to "config.h"
configure-option --with-netpbm=PATH and --without-netpbm added
update configure.in according to autoconf 2.57
wchar_h miss-configuration fixed in pgm2asc.c
fix compiler warnings
char filter accepts abbreviations now, like "0-9A-F" (but slow)
update READMEde.txt
output barcode tags (also improved recognition)
fix pnm.c for files like example.eps.pbm
fix detect.c for barcodes
fix ocr0n.c 0<->8g
0.38 move UTF/HTML/TeX decoding to getTextLine, return (char *) now
out_format HTML step towards detailed XML output
correct line detection for footnotes (detect.c)
"y" now seen as vowel (pgm2asc.c), I<vowel> susbtituted by l<vowel>
é-detection, á-output fixed
default dust_size is -1 now (auto detection = mean_size/10)
char filter added
ex: -C 0123456789ABCDEF - recognize only hexcodes
man page updated (hopefully correct syntax)
database bug fixed (small fonts, example by Chris)
several bugs fixed by W. Webber (thanks)
speed improved by 3rd-pass matrix filter in pixel() (pixel.c) (code from W. Webber)
bug in remove_dust (remove.c) fixed
for fonts bigger than 20x40 smooth_borders() changed (b/w-scans)
bug in O0-detection fixed
0.37 best-fit generates probability, not perfect but better results
bug in line detection removed (happens for lot of small boxes)
progress output (option -x <fileID|fname>)
counting versions number as floating point now
MACRON and DOT_ABOVE (not complete) defined (latin2)
adaptions for 5x7 and 6x12 screen font
doc/ocr.tex changed to doc/gocr.html (now independent of LaTeX)
symbols {} added
OCR-B font tested succesfull
better headline/picture distinction
bug removed (struct box.modifier is wchar_t now)
known bugs: to much newlines
0.3.6
CARON and Omega defined,
output of not defined chars (HTML="&#xxx;", TeX="\symbol{xxx}")
system dependend bug: isupper(>255) SIGSEGV fixed
better line detection for lines with lowercase chars only
lot of possible SIGSEGV in list_del() fixed
barcode recognition (UPC,code128)
.ps .eps via pstopnm supported
-m 256 switches off the main ocr engine (usefull together with -m 2 for identical chars)
strings added to database ("ff","ft","special-symbol")
gocr.tcl adapted to gocr v0.3
internal detection probability introduced
0.3.5
minor and major fixes (string\0 bugs)
memory leak fixes by Duncan Edwards
layout analysis or zoning (-m 4) improved,
now it detects pictures and columns much better
the behavior of setting threshold (-l) is slightly changed
wcsdup defined for non-gnu-systems (BSD), further Problems?
better context correction for 10 (IO,lO)
Fixes for S.Koledin examples "GlS"
Euro-currency-sign detection added
better pitch estimation for proportional font (needs to be improved)
make install DESTDIR= instead configure --prefix= (better?)
use wchar_t by default, more simple code and -f works with nonLinuxOS
line detection more robust against vertical glued chars (js)
-f UTF8 added (usefull for xterm -u8), should be default?
handle vertical glued boxes (ex: g over T)
0.3.4
some BSD adaptions (no WCHAR?), tell me if there are still problems
use unicode in database (4-8 hex digits)
new option: -p database_path/
TILDE fixed, #, Æ, Å, etc. added (swedish,norwegian)
layout analysis improved
0.3.3
database (-m 2) bug fixed and interactive mode (-m 130) added
its not finished, but you can test it
result should be ok for machine generated images (no scans)
engine improved a bit
0.3.2
ocr-engine improved for screen fonts (thanks for examples)
option -f [HTML,TeX,...] added
0.3.1
make install updated
0.3.0 some parts of the code reviewed (most work done by Bruno Barberi Gnecco)
tkispell patch from David Pinson (exec bug fixed)
gnome frontend added (Dany De Bontrider)
acute, grave, circumflex ... detection
C++ parts rewritten into C, and much more (see REVIEW)
0.2.7 lib-patch from Klaas Freitag inserted, engine improved
option -n 1 detect only numbers, get threshold value by otsu.cc
xxx.pnm.bz2 can be used on linux systems bzip2 installed
0.2.6 pipes used on POSIX2-systems for easier use of jpg,gif,tiff,pnm.gz-files
example: gocr text.jpg; gocr text.pnm.gz
verbose output on stderr, text output on stdout,
redirection of output possible (-e, -o, example: -e /dev/stdout)
engine upgraded a bit (thx for the new sample files)
gocr.tcl upgraded (save options, save text)
DOS/WIN95-EXE created, download GOCREXE.ZIP (v0.2.5)
0.2.5 program convert renamed to jconv
you can choose stdin as input now, for using conversion tools
example: djpeg -pnm -gray text.jpg | gocr -i -
option "--help" added, some bugs removed
amiga.h added for SAS/C under AmigaOS (suggested by Uffe Holst)
line detection changed (faster?)
importing gocr in your C++ application is easier now (see fkt pgm2asc)
argument can be given instead of option -i (this is more natural)
some reorganization of code (not finished)
2000 downloads counted !!! Jun2000
SourceForge.net used for gocr (project: jocr, other gocr exist there)
bugs in dust removing, line detection and zoning fixed (rewritten)
first version of tcl/tk-GUI, test it!
rekursive function frame_nn() replaced by labyrint-algorithm (no extensiv stack used)
gluing of broken chars added, removing glued serifs (on small fonts)
new bugs added :;
0.2.4a2 some details are added (better dust removing and char division)
0.2.4 three char division (connected chars), dust removing
0.2.3 add layout analysis (very slowly, try -m 4), engine modified
better distance function, engine updated, database added for testing
1000 downloads counted !!! May2000
0.2.2 gocr_0_2.tgz expands into gocr_0_2 directory (thanks to zz99zz)
engine upgraded a bit, some bugs fixed (umlaut, thin lines)
short documentation added (ocr.tex)
colored output (out30.bmp) for test/development-mode
bug: read ASC-PBM and PCX (1 bit) fixed
0.2.1 first official release on freshmeat.net March 2000
0.2 line scanning added
0.1 project started (not documented), autumn 1998 - summer 1999