-
-
Notifications
You must be signed in to change notification settings - Fork 113
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: dictionary as language module (#1185)
* feat: SpellChecker Dictionary as a language module Signed-off-by: Hiroshi Miura <[email protected]> * feat: morfologik speller: use dictionary from language module without copy to a file system Signed-off-by: Hiroshi Miura <[email protected]> * feat: update ISpellCheckerDictionary to support legacy hunspell checker - Add API installHunspellDictionary(Path dictionaryDir) - Support it on AR, DA and FR - Provide bundled dictionary on FR Signed-off-by: Hiroshi Miura <[email protected]> * feat: hunspell checker use language-module's dictionary Signed-off-by: Hiroshi Miura <[email protected]> * feat: add abstract impl classes for spell dictionary Signed-off-by: Hiroshi Miura <[email protected]> * feat: implement spell checker dictionaries Signed-off-by: Hiroshi Miura <[email protected]> * fix: test case expectation for language-module-km spell dict Signed-off-by: Hiroshi Miura <[email protected]> * fix: dutch spell dictionary class Signed-off-by: Hiroshi Miura <[email protected]> * fix: typo of dutch spell dictionary class path Signed-off-by: Hiroshi Miura <[email protected]> * fix: typo of russian spell dictionary language Signed-off-by: Hiroshi Miura <[email protected]> * fix: Abstract*Dictionary to detect supported language - Accept given langauge code e.g.,"sk" is contains in supported e.g.,"sk_SK" Signed-off-by: Hiroshi Miura <[email protected]> * Fix: update supported short language code Signed-off-by: Hiroshi Miura <[email protected]> * Fix: slovenian LT dependency and language code Signed-off-by: Hiroshi Miura <[email protected]> * Fix: Swedish module LT language dependency Signed-off-by: Hiroshi Miura <[email protected]> * fix: sv: test with full sv_SE language specifier Signed-off-by: Hiroshi Miura <[email protected]> * fix: Tagalog and Tamil spell dictionary Signed-off-by: Hiroshi Miura <[email protected]> * style: copyright header of tagalog module Signed-off-by: Hiroshi Miura <[email protected]> * fix: improve test for hunspell and de modules Signed-off-by: Hiroshi Miura <[email protected]> * docs: introduce developer manual to create spell-check dictionary plugin Signed-off-by: Hiroshi Miura <[email protected]> * docs: update user manual - explain the folder for user CUSTOM spelling dictionary - explain the language module will install the spelling dictionary when necessary. - update explaination of the spelling preference Signed-off-by: Hiroshi Miura <[email protected]> * feat: update preferences view of SpellChecker - Remove URL box and install/uninstall buttons - Remove DictionaryInstallerDialog - clean bundles Signed-off-by: Hiroshi Miura <[email protected]> * feat: list spelling dictionary from language modules - Extend SpellCheckerManager to return supported languages - Update DictionaryManager#getLocalDictionaryCodeList to return languages from language modules Signed-off-by: Hiroshi Miura <[email protected]> * Update OmegaT_Preferences.xml @Kazephil could you check my modifications please? @miurahr I don’t think we should refer to the developer manual here. We have not done so for other parts of OmegaT. I don’t oppose that, of course, but we need to think about how to do that. * Rewording of the paragraph on spelling dictionaries - I tried to reword the paragraph to flow more smoothly. Let me know if anything seems off. - I agree with Jean-Christophe about the developer manual reference, so I simply deleted it here. We will have to give some thought about how and where we can best make that information available. * docs: update developer manual - fix section levels - update overview section Signed-off-by: Hiroshi Miura <[email protected]> --------- Signed-off-by: Hiroshi Miura <[email protected]> Co-authored-by: Jean-Christophe Helary <[email protected]> Co-authored-by: kazephil <[email protected]>
- Loading branch information
1 parent
6523a16
commit 4175085
Showing
164 changed files
with
1,261,739 additions
and
1,064 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,239 @@ | ||
# How to publish a spell check dictionary as plugin | ||
|
||
## Overview | ||
|
||
OmegaT provides a feature for translators to check their translations using spell-check dictionaries. | ||
Developers can enhance this functionality by creating custom spell-check dictionary plugins. | ||
These plugins must implement the `ISpellCheckDictionary` interface, which defines the necessary methods | ||
to integrate with OmegaT. OmegaT also provides abstract classes for plugins. | ||
There are ``AbstractHunspellDictionary`` and ``AbstractMorfologikDictinonary`` abstract classes. | ||
|
||
This document provides guidance on creating a plugin, explaining the purpose of the interface methods and abstract | ||
methods, | ||
and giving practical tips for implementation. | ||
|
||
## `ISpellCheckDictionary` Interface | ||
|
||
The `ISpellCheckDictionary` interface defines methods for integrating custom spell-check dictionaries. | ||
Implementing this interface allows your plugin to support different dictionary types, such as Hunspell and Morfologik. | ||
|
||
```java | ||
public interface ISpellCheckerDictionary extends Closeable { | ||
/** | ||
* Get Hunspell dictionary. | ||
* | ||
* @return Dictionary object when the language module has. Otherwise, null. | ||
*/ | ||
default org.apache.lucene.analysis.hunspell.Dictionary getHunspellDictionary(String language) { | ||
return null; | ||
} | ||
|
||
/** | ||
* Get Morfologik dictionary. | ||
* | ||
* @return Dictionary object when the language module has. Otherwise, null. | ||
*/ | ||
default morfologik.stemming.Dictionary getMorfologikDictionary(String language) { | ||
return null; | ||
} | ||
|
||
default Path installHunspellDictionary(Path dictionaryDir, String language) { | ||
return null; | ||
} | ||
|
||
/** | ||
* Get a dictionary type. | ||
* | ||
* @return type of dictionary. If the module provides nothing, return null. | ||
*/ | ||
SpellCheckDictionaryType getDictionaryType(); | ||
} | ||
``` | ||
|
||
### Method Descriptions | ||
|
||
1. **`getHunspellDictionary(String language)`** | ||
|
||
- **Purpose:** Provides access to a Hunspell dictionary for the specified language. | ||
- **Return Value:** | ||
- A `Dictionary` object if the language module supports Hunspell. | ||
- `null` if Hunspell is not supported. | ||
|
||
**Example Usage:** | ||
```java | ||
@Override | ||
public org.apache.lucene.analysis.hunspell.Dictionary getHunspellDictionary(String language) { | ||
// Load and return the Hunspell dictionary for the specified language. | ||
} | ||
``` | ||
|
||
2. **`getMorfologikDictionary(String language)`** | ||
|
||
- **Purpose:** Provides access to a Morfologik dictionary for the specified language. | ||
- **Return Value:** | ||
- A `Dictionary` object if the language module supports Morfologik. | ||
- `null` if Morfologik is not supported. | ||
|
||
**Example Usage:** | ||
```java | ||
@Override | ||
public morfologik.stemming.Dictionary getMorfologikDictionary(String language) { | ||
// Load and return the Morfologik dictionary for the specified language. | ||
} | ||
``` | ||
|
||
3. **`installHunspellDictionary(Path dictionaryDir, String language)`** | ||
|
||
- **Purpose:** Installs a Hunspell dictionary for the specified language in a given directory. | ||
- **Parameters:** | ||
- `dictionaryDir`: The directory where the dictionary will be installed. | ||
- `language`: The language code for the dictionary. | ||
- **Return Value:** | ||
- The path to the installed dictionary. | ||
- `null` if installation is not supported. | ||
|
||
**Example Usage:** | ||
```java | ||
@Override | ||
public Path installHunspellDictionary(Path dictionaryDir, String language) { | ||
// Logic to download or copy the Hunspell dictionary into dictionaryDir. | ||
} | ||
``` | ||
|
||
4. **`getDictionaryType()`** | ||
|
||
- **Purpose:** Specifies the type of dictionary supported by the plugin. | ||
- **Return Value:** | ||
- A `SpellCheckDictionaryType` enum value, e.g., `HUNSPELL`, `MORFOLOGIK`. | ||
- `null` if no dictionary type is provided. | ||
|
||
**Example Usage:** | ||
```java | ||
@Override | ||
public SpellCheckDictionaryType getDictionaryType() { | ||
return SpellCheckDictionaryType.HUNSPELL; | ||
} | ||
``` | ||
|
||
### Example Implementation | ||
|
||
```java | ||
public class MyHunspellDictionaryPlugin implements ISpellCheckDictionary { | ||
|
||
@Override | ||
public org.apache.lucene.analysis.hunspell.Dictionary getHunspellDictionary(String language) { | ||
// Load and return the Hunspell dictionary for the specified language. | ||
return new org.apache.lucene.analysis.hunspell.Dictionary(...); | ||
} | ||
|
||
@Override | ||
public SpellCheckDictionaryType getDictionaryType() { | ||
return SpellCheckDictionaryType.HUNSPELL; | ||
} | ||
|
||
@Override | ||
public void close() { | ||
// Cleanup resources if needed. | ||
} | ||
} | ||
``` | ||
|
||
## Creating a Hunspell Spell-Check Dictionary Plugin | ||
|
||
OmegaT provides an abstract class, `AbstractHunspellDictionary`, to simplify the process of implementing | ||
a Hunspell-based spell-check dictionary. Developers can use this class to create plugins that support specific | ||
languages by implementing a minimal set of methods. | ||
|
||
This document provides guidance on using `AbstractHunspellDictionary`, including method descriptions, | ||
implementation steps, and a complete example for a Catalan Hunspell dictionary. | ||
|
||
### Abstract Class: `AbstractHunspellDictionary` | ||
|
||
The `AbstractHunspellDictionary` class implements the `ISpellCheckDictionary` interface and | ||
includes additional utilities for managing Hunspell dictionaries. Developers need to subclass this abstract class | ||
and implement its key methods to provide language-specific dictionary support. | ||
|
||
### Key Features of `AbstractHunspellDictionary` | ||
|
||
1. **Dictionary Management** | ||
- Locates and loads Hunspell `.aff` and `.dic` files. | ||
- Provides access to the Hunspell dictionary for a given language. | ||
|
||
2. **Helper Methods** | ||
- **`protected abstract String[] getDictionaries()`** | ||
- Returns the list of supported language codes for the dictionary. | ||
- **`protected String getDictionary(String language)`** | ||
- Finds the appropriate dictionary for a given language. | ||
- **`protected abstract InputStream getResourceAsStream(String resource)`** | ||
- Retrieves the dictionary resource stream. | ||
|
||
3. **Predefined Implementation of `ISpellCheckDictionary` Methods** | ||
- **`getHunspellDictionary(String language)`** | ||
- Loads the Hunspell dictionary for the specified language. | ||
- **`installHunspellDictionary(Path dictionaryDir, String language)`** | ||
- Installs the Hunspell dictionary files in a specified directory. | ||
- **`getDictionaryType()`** | ||
- Returns `SpellCheckDictionaryType.HUNSPELL`. | ||
- **`close()`** | ||
- Closes any open streams to release resources. | ||
|
||
|
||
### Implementation Steps | ||
|
||
1. **Subclass `AbstractHunspellDictionary`** | ||
- Create a new class extending `AbstractHunspellDictionary`. | ||
|
||
2. **Implement Required Methods** | ||
- Define the supported language codes in `getDictionaries()`. | ||
- Provide logic to retrieve resource streams for dictionary files in `getResourceAsStream(String resource)`. | ||
|
||
3. **Package the Implementation** | ||
- Include your dictionary files (`.aff` and `.dic`) in the project resources directory. | ||
- Package the implementation class as a plugin (e.g., a JAR file). | ||
|
||
|
||
## Example: Catalan Hunspell Dictionary | ||
|
||
Below is a complete implementation of a Catalan Hunspell dictionary plugin using `AbstractHunspellDictionary`. | ||
|
||
### Dictionary Files | ||
|
||
Ensure the following files are placed in the `resources` directory: | ||
- `ca.aff` | ||
- `ca.dic` | ||
|
||
### Implementation | ||
|
||
```java | ||
public class CatalanHunspellDictionary extends AbstractHunspellDictionary { | ||
|
||
// Supported language codes | ||
private static final String[] HUNSPELL = { "ca" }; | ||
|
||
/** | ||
* Provides the list of supported languages. | ||
* @return an array of language codes. | ||
*/ | ||
@Override | ||
protected String[] getDictionaries() { | ||
return HUNSPELL; | ||
} | ||
|
||
/** | ||
* Retrieves the resource stream for a given dictionary file. | ||
* @param resource the resource file name. | ||
* @return an InputStream for the resource. | ||
*/ | ||
@Override | ||
protected InputStream getResourceAsStream(final String resource) { | ||
return getClass().getResourceAsStream(resource); | ||
} | ||
} | ||
``` | ||
|
||
## Conclusion | ||
|
||
The `AbstractHunspellDictionary` class reduces the complexity of implementing Hunspell dictionaries. | ||
By following the steps and using the provided example, developers can quickly create plugins for specific languages. | ||
By implementing the `ISpellCheckDictionary` interface, developers can extend OmegaT’s functionality, | ||
enabling support for additional spell-checking languages or dictionary types. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
47 changes: 47 additions & 0 deletions
47
language-modules/ar/src/main/java/org/omegat/languages/ar/ArabicHunspellDictionary.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
/* | ||
* OmegaT - Computer Assisted Translation (CAT) tool | ||
* with fuzzy matching, translation memory, keyword search, | ||
* glossaries, and translation leveraging into updated projects. | ||
* | ||
* Copyright (C) 2023-2024 Hiroshi Miura | ||
* Home page: https://www.omegat.org/ | ||
* Support center: https://omegat.org/support | ||
* | ||
* This file is part of OmegaT. | ||
* | ||
* OmegaT is free software: you can redistribute it and/or modify | ||
* it under the terms of the GNU General Public License as published by | ||
* the Free Software Foundation, either version 3 of the License, or | ||
* (at your option) any later version. | ||
* | ||
* OmegaT is distributed in the hope that it will be useful, | ||
* but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
* GNU General Public License for more details. | ||
* | ||
* You should have received a copy of the GNU General Public License | ||
* along with this program. If not, see <https://www.gnu.org/licenses/>. | ||
*/ | ||
package org.omegat.languages.ar; | ||
|
||
import java.io.InputStream; | ||
|
||
import org.languagetool.JLanguageTool; | ||
|
||
import org.omegat.core.spellchecker.AbstractHunspellDictionary; | ||
|
||
public class ArabicHunspellDictionary extends AbstractHunspellDictionary { | ||
|
||
private static final String DICTIONARY_BASE = "/org/languagetool/resource/ar/hunspell/"; | ||
private static final String[] LANG = {"ar"}; | ||
|
||
@Override | ||
protected String[] getDictionaries() { | ||
return LANG; | ||
} | ||
|
||
@Override | ||
protected InputStream getResourceAsStream(final String resource) { | ||
return JLanguageTool.getDataBroker().getAsStream(DICTIONARY_BASE + resource); | ||
} | ||
} |
Oops, something went wrong.