This is a Go program to fetch Wiktionary page content from their API, (optionally) intersect it with a frequency wordlist (as the database is probably too big otherwise), and then produce an HTML file that, together with an .opf file, can be converted to mobi and used on your Kindle for in-book lookups.
Note: My target lang is Finnish, so that’s what I wrote this program in mind with. Hopefully it’ll work out of the box for your TL too, but there’s always the possibility that it does something wonky in the inflection table. Fear not though, goquery is easy to work with!
-
Download the necessities
-
Install Go. Probably 1.12.
-
Download a frequency wordlist for your language from here if possible.
Otherwise the file might be too big for
kindlegen
to handle, as it’s a 32-bit program. Finnish, with its 98184 lemmata, proved too big to process without a freq list, but your obscure language might be fine.
If you can’t find one, just omit the-freqlist
flag below. -
Download
kindlegen
for your platform You’ll use this to convert the .opf + .html files into .mobi.
-
-
Edit the metadata in
dict.opf
.-
Don’t forget to modify
<DictionaryInLanguage>
!
Set it to the ISO 639-1 code from here. -
You can replace
cover.png
too, but it matter much as the dictionary won’t show up as a book by default.
-
-
To generate
dict.html
, whichdict.opf
references, run this, with the name of the frequency list you downloaded instead offi.txt
:go run kindlewick.go -freqlist fi.txt
If it’s still too big, you can just take the first 50k lines or whatever from the file (in bash/zsh/etc) like so:
go run kindlewick.go -freqlist <(head -n 50000 fi.txt)
-
Finally, generate the .mobi file and put it on your Kindle!
kindlegen dict.opf -verbose -c2 -o my_dict.mobi
- How are inflections acquired?
-
Basically it just takes every span inside a table cell, and if it consists of multiple words, takes the last one (
olen odottanut
→odottanut
, since you can only look up a single word at a time on Kindle), and filters out duplicates. - Why not consult the frequency list before downloading every single page?
-
Because frequency lists usually have inflected forms of words, and if you only see
olen
in the list you won’t know you have to download the lemma formolla
. Ergo, download everything and keep entries where at least some form of the word shows up in the frequency list.