Skip to content
This repository has been archived by the owner on Nov 19, 2021. It is now read-only.

Translation

preemeijer edited this page Jun 27, 2015 · 9 revisions

Do you want to help us translate?

Just create an account on https://duck.co/translate and you are set to go.

For background information about Translation.

Please go to page Translation System.

For issues.

If you experience a problem with translating (or translations) please see go to the following overview of open Translation issues on GitHub.

Tokens

So called tokens, which are the main part of all the flows about translation. Coders are making tokens in the code, the templates or wherever it's needed. Those tokens define the texts that have to get translated. So, a very important point here is to understand that the text that has to be translated is the token. Let's see some examples of templates that make it easier to understand this system.

Simple token

Example of a simple token. This defines a simple token Screenshot Simple Token Example

Gettext, our translation storage, actually has no data file for these so called tokens itself because the text data file only contains the token AND a translation. To have a good normalization, we store this in our database under the same fieldname as expected by gettext. Those tokens are stored in the database of our community platform.

In the general translation interface of the community platform, you normally see a list of those tokens, but we will explain the translation interface later. You can see the text to translate right next to the word Term(Singular) on top. Below you see the Translations: from you or other users.

Token with context

As you can see, the above example was a simple token, it is just text, it does not really concern any of our problem cases. A problem case would be, for example, the token Medium. This is a very very vague word, you need a bit of a context to really find the right translation, even if you think that it is very clear in English, you can imagine that a lonely Medium can be in lots of different kinds of context. Gettext offers here the option to give a so called context additional to a token, which allows us to give a bit more "context" without changing the token itself.

Screenshot Token with Context

As you see, above the word that has to be translated, you see Context, which SHOULD not be taken as the real description for the context, on contrary. This context helps everyone working with the tokens to find this specific token in the code, templates or wherever it needs to be coordinated.

A much clearer description is in the notes for this token. More and more notes are filled with a link to a picture that shows you exactly where the token is used on the domain.

So here is already a very first thing to take care of, if you are responsible for working with tokens, you can't give everything a context, else the reusage of tokens is much harder.

Placeholders in tokens

Screenshot Token a Placeholder Placeholders in tokens are giving many options to make the displaying of the text more fine tuned. Often it is required that inside the text itself you put a special wrapping for the display, like HTML. This can be achieved with placeholders. This allows the translator to move the dynamic text to another place, which is more proper in his language (like if it would be right to left, the placeholder might be more left).

Two things are happening in the translation system: At first gettext will try to find the translation for this given token, and in the translation the translator also keeps this placeholder %s. After this translation is found, for example going Pirate:

%s, ahoi! hrrr

Given this translation example, the translated text for a user switched to Pirate, with the username Doe, would be:

Doe, ahoi! hrrr

Spaces are not required around placeholders, so if in your language it's better to make something one word it is perfectly valid to do so.

Placeholders and grammatical numbers

Additionally to placeholders for text, we always cover combined with gettext the option for dynamic numbered cases, which requires to decide for the proper grammatical numbers case in the language, and replace the placeholder for the number with the number given for the case. This is used to define a token which is based on the number for the specific token. In the definition in the gettext storage it ends like this:

msgid "You have %d message." msgid_plural "You have %d messages."

Some languages might have more than 2 forms. But whatever it is, gettext handles this with the translation datafile for the given language. After gettext has picked the correct translated text, it will put this translation towards sprintf, which replaces the placeholders with the proper values. If we have $messages like 3 on the above example, the output would be:

You have 3 messages.

The combination of gettext and sprintf here sadly has the disadvantage to force the amount that defines the plural form to be the first placeholder. This makes problems in the combination with combined tokens. We will explain the problem in the section "Combined tokens".

Combined tokens

We are able to make tokens specific for special visual needs, like if they need additional HTML. An example could be:

l('%s for more info!', '<a href="...">' ~ l('Click here') ~ '</a>'

Would give out:

<a href="...">Click here</a> for more info!

Which allows us to exclude the HTML from the translation, and still gives the translator enough freedom to define which part of his text is the text that should be click-able (or colored differently or whatever). A bigger problem exists, if we combine those placeholders with number placeholders, explained in the section "Placeholders and grammatical numbers", this concept forces to get the amount that is used to determine the proper plural form that must be set to the first position. It could lead to a problem, like in this case:

$username has $count message.

$username has $count messages.

If we would try to convert this to "%s has %d messages.", we would run into the problem that the first placeholder would be $username and not $count. To avoid this we can use the method of sprintf to change the order of the values given for the placeholders. The right solution would be:

ln("%2$s has %1$d message.","%2$s has %1$d messages.",$count,$username)

The very big disadvantage here is the organizational part. It is really complex to have all those tokens in the database and still refering which ones are staying together. It always requires lots of comments and further information. In some very awkward cases, you may have a real extreme cascading of the tokens. In those cases it is really essential to generate context. Most combined tokens are gathered under one specific msgctxt, in the translation interface. You can click on the context given in the interface to reach a page with all tokens of this specific context. Still we try to add comments to every token that describes the complete text context where the token is used or a screenshot.

Voting

On the community platform, you are able to vote for an existing translation, instead of making your own translation. You even can start a Wizard on the Community Platform to go through the unvoted tokens. First go to the specific Domain and select the language. There you see the button: Vote on translated tokens.

Used translations

The system which generates the translation po files for all the languages, picks the translation by finding the translation with the most votes. If there are several translations with the same amount of votes, the translation will be used where the translator has the highest grade in this language. We will explain this in the community platform section ("The community platform") more deeply. This process happens at the release of the translations.

Discuss about translations

Discuss the tokens of a domain in your language. You can talk about how to handle the tokens in every domain in your language, or other linguistic problems you encounter. This can be on every domain but also within every domain on specific token.

Domains

For organizational reasons, but also for technical matters, we are required to group the tokens in the so called token domains. The terminology is taken from gettext, which also defines that one po file is one specific domain. We normally define a default domain at the initialization of our translation library. This way we never need to care about the domain on the function calls for the translation library. Also the release of the translations is executed for every token domain individually. On the live systems we install all the latest releases of all token domains at once.

##Token domains in the community platform In the community platform, the different token domains are independent lists of tokens. This allows translators to pick the block of tokens they want to work on. It is very important to understand that a half done translation is nearly as usable as no translation at all.

Overlapping tokens

The definition of a token is bound to its token domain. This means that 2 identical tokens in 2 domains are still 2 independent tokens and still both need to be translated. In most cases, this sadly leads to a pointless translation with the exactly same meaning and probably even in the exactly same context. It is very hard to avoid this. Still there are cases left where the context given with the token domain might slighty fix the interpretation more deeply. Only because a token seems very identical in the English language does not directly lead to the same interpretation with other languages.

Standard agreements

This short list is to know about what basic things you, as a translator, need to know about rules we agreed upon.

  • All translations are informal