-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add names for 2 letter codes #572
Add names for 2 letter codes #572
Conversation
_data/languages.json
Outdated
"ie" | ||
], | ||
"names": [ | ||
"Interlingue" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Interlingue", "Occidental"
generate.py
Outdated
@@ -213,7 +213,7 @@ def normalize(code): | |||
base_code = base_language_code(normalized_code) | |||
if base_code in language['codes']: | |||
language_name = language.get('names', [None])[0] | |||
language_slug = slugify(language_name) if language_name else code | |||
language_slug = slugify(language_name) if language_name else base_code | |||
break | |||
if api_id not in [ 'alibaba', 'baidu', 'niutrans' ] and len(base_code) == 2 and not language_name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this stricter now?
e.g.
if len(base_code) == 2 and not language_name:
@@ -213,7 +213,7 @@ def normalize(code): | |||
base_code = base_language_code(normalized_code) | |||
if base_code in language['codes']: | |||
language_name = language.get('names', [None])[0] | |||
language_slug = slugify(language_name) if language_name else code | |||
language_slug = slugify(language_name) if language_name else base_code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this change mean? For which languages does it change something? e.g Chinese?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, the change doesn't make any difference since I have also added the names for 2-letter codes. However, I just changed it to base_code
since the validation was done with base_code
.
_data/languages.json
Outdated
@@ -6707,272 +6714,317 @@ | |||
}, | |||
{ | |||
"codes": [ | |||
"huyu" | |||
"ch" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, we could easily add the 3-letter codes for every language - we have the mapping from 3-letter codes to 2-letter codes in api_languages.json, could reverse it to add all these.
languages/kalaallisut,.md
Outdated
@@ -5,10 +5,12 @@ nav_order: 998 | |||
nav_exclude: true | |||
parent: Languages | |||
layout: language | |||
title: <code>kl</code> | |||
description: Machine translation for <code>kl</code> | |||
title: Kalaallisut, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comma is somehow in the .json?
languages/volapük.md
Outdated
@@ -5,17 +5,19 @@ nav_order: 999 | |||
nav_exclude: true | |||
parent: Languages | |||
layout: language | |||
title: <code>vo</code> | |||
description: Machine translation for <code>vo</code> | |||
title: "Volap\xFCk" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not like this in the .json file. Maybe something with your Python setup? We should just use Unicode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And our slugify function for the page name should probably remove diacritics - umlauts, accent marks etc. languages/volapuk.md not languages/volapük.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not like this in the .json file. Maybe something with your Python setup? We should just use Unicode.
I'll open a separate issue for handling such characters in .md files properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, thanks! Just a few small comments and questions
…masharrison/machinetranslate.org into add-names-for-2-letter-codes
…o return slug without special characters
@bittlingmayer jan, All done. |
Hey @tovmasharrison! |
Hi @cefoo! It's great to hear that it's working as expected! |
Fixed @bittlingmayer |
Description
I have added the names for two letter language codes. Afterward, I modified how the unlisted_langauges list is populated and resolved the duplicate articles for 3-letter codes bug. #567
Fixes #571
Checklist: