Add names for 2 letter codes #572

tovmasharrison · 2023-11-04T07:19:47Z

Description

I have added the names for two letter language codes. Afterward, I modified how the unlisted_langauges list is populated and resolved the duplicate articles for 3-letter codes bug. #567

Fixes #571

Checklist:

I have read the contributing guidelines.
I have followed the style guide.

bittlingmayer · 2023-11-04T14:52:02Z

_data/languages.json

+            "ie"
+        ],
+        "names": [
+            "Interlingue"


"Interlingue", "Occidental"

bittlingmayer · 2023-11-04T14:56:37Z

generate.py

@@ -213,7 +213,7 @@ def normalize(code):
      base_code = base_language_code(normalized_code)
      if base_code in language['codes']:
        language_name = language.get('names', [None])[0]
-        language_slug = slugify(language_name) if language_name else code
+        language_slug = slugify(language_name) if language_name else base_code
        break
    if api_id not in [ 'alibaba', 'baidu', 'niutrans' ] and len(base_code) == 2 and not language_name:


Can we make this stricter now?

e.g.

if len(base_code) == 2 and not language_name:

bittlingmayer · 2023-11-04T14:57:20Z

generate.py

@@ -213,7 +213,7 @@ def normalize(code):
      base_code = base_language_code(normalized_code)
      if base_code in language['codes']:
        language_name = language.get('names', [None])[0]
-        language_slug = slugify(language_name) if language_name else code
+        language_slug = slugify(language_name) if language_name else base_code


What does this change mean? For which languages does it change something? e.g Chinese?

At the moment, the change doesn't make any difference since I have also added the names for 2-letter codes. However, I just changed it to base_code since the validation was done with base_code.

bittlingmayer · 2023-11-04T14:59:25Z

_data/languages.json

@@ -6707,272 +6714,317 @@
    },
    {
        "codes": [
-            "huyu"
+            "ch"


By the way, we could easily add the 3-letter codes for every language - we have the mapping from 3-letter codes to 2-letter codes in api_languages.json, could reverse it to add all these.

bittlingmayer · 2023-11-04T15:00:27Z

languages/kalaallisut,.md

@@ -5,10 +5,12 @@ nav_order: 998
 nav_exclude: true
 parent: Languages
 layout: language
-title: <code>kl</code>
-description: Machine translation for <code>kl</code>
+title: Kalaallisut,


The comma is somehow in the .json?

bittlingmayer · 2023-11-04T15:03:24Z

languages/volapük.md

@@ -5,17 +5,19 @@ nav_order: 999
 nav_exclude: true
 parent: Languages
 layout: language
-title: <code>vo</code>
-description: Machine translation for <code>vo</code>
+title: "Volap\xFCk"


It's not like this in the .json file. Maybe something with your Python setup? We should just use Unicode.

And our slugify function for the page name should probably remove diacritics - umlauts, accent marks etc. languages/volapuk.md not languages/volapük.md

It's not like this in the .json file. Maybe something with your Python setup? We should just use Unicode.

I'll open a separate issue for handling such characters in .md files properly.

bittlingmayer

Looks good overall, thanks! Just a few small comments and questions

…masharrison/machinetranslate.org into add-names-for-2-letter-codes

…o return slug without special characters

tovmasharrison · 2023-11-08T09:27:20Z

@bittlingmayer jan, All done.

cefoo · 2023-11-13T11:46:44Z

Hey @tovmasharrison!
Works well on my local copy. Thank you for your work!!

tovmasharrison · 2023-11-13T12:38:25Z

Hey @tovmasharrison!
Works well on my local copy. Thank you for your work!!

Hi @cefoo!

It's great to hear that it's working as expected!

bittlingmayer · 2023-11-17T22:09:40Z

This seems wrong.

tovmasharrison · 2023-11-28T11:59:28Z

This seems wrong.

Fixed @bittlingmayer

Tovmas added 2 commits November 2, 2023 15:11

Remove unlisted languages and add names for 2 letter codes

4adb46f

Add names for 2 letter codes

7378f60

bittlingmayer reviewed Nov 4, 2023

View reviewed changes

bittlingmayer requested changes Nov 4, 2023

View reviewed changes

bittlingmayer mentioned this pull request Nov 4, 2023

Better coverage of very low-resource languages #566

Open

Tovmas added 4 commits November 7, 2023 14:55

Remove unlisted languages and add names for 2 letter codes

9b66457

Add names for 2 letter codes

cb6140a

Merge branch 'add-names-for-2-letter-codes' of https://github.com/tov…

1c30f36

…masharrison/machinetranslate.org into add-names-for-2-letter-codes

Add 3-letter codes along with the 2-letter codes and change slugify t…

1b7f258

…o return slug without special characters

Tovmas added 2 commits November 28, 2023 14:29

Add language codes for Chinese

f436976

Modify language code capitalization

0de6169

bittlingmayer merged commit d2e2e09 into machinetranslate:master Dec 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add names for 2 letter codes #572

Add names for 2 letter codes #572

tovmasharrison commented Nov 4, 2023

bittlingmayer Nov 4, 2023

bittlingmayer Nov 4, 2023

bittlingmayer Nov 4, 2023

tovmasharrison Nov 7, 2023

bittlingmayer Nov 4, 2023 •

edited

Loading

bittlingmayer Nov 4, 2023

bittlingmayer Nov 4, 2023

bittlingmayer Nov 4, 2023

tovmasharrison Nov 8, 2023

bittlingmayer left a comment

tovmasharrison commented Nov 8, 2023

cefoo commented Nov 13, 2023

tovmasharrison commented Nov 13, 2023

bittlingmayer commented Nov 17, 2023

tovmasharrison commented Nov 28, 2023

Add names for 2 letter codes #572

Add names for 2 letter codes #572

Conversation

tovmasharrison commented Nov 4, 2023

Description

Checklist:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bittlingmayer Nov 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bittlingmayer left a comment

Choose a reason for hiding this comment

tovmasharrison commented Nov 8, 2023

cefoo commented Nov 13, 2023

tovmasharrison commented Nov 13, 2023

bittlingmayer commented Nov 17, 2023

tovmasharrison commented Nov 28, 2023

bittlingmayer Nov 4, 2023 •

edited

Loading