-
Notifications
You must be signed in to change notification settings - Fork 384
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
CLDR-17566 Converting Updating Codes P1 (#4005)
- Loading branch information
Showing
7 changed files
with
296 additions
and
0 deletions.
There are no files selected for viewing
29 changes: 29 additions & 0 deletions
29
docs/site/development/updating-codes/external-version-metadata.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
--- | ||
title: Updating External Version Metadata | ||
--- | ||
|
||
# Updating External Version Metadata | ||
|
||
## Updating Metadata | ||
|
||
[CLDR\-15005](https://unicode-org.atlassian.net/browse/CLDR-15005) is for updating the process for external metadata versions. The following table is out of date with [common/properties/external\_data\_versions.tsv](https://github.com/unicode-org/cldr/blob/main/common/properties/external_data_versions.tsv) | ||
|
||
### TODO: Need to add instructions for updating external metadata | ||
|
||
~~The following tells how to get the version info for imported data used in a CLDR release.~~ | ||
|
||
| Data | File | Version Info | Date | | ||
|---|---|---|---| | ||
| UN literacy data | [un_literacy.csv](https://github.com/unicode-org/cldr/blob/master/tools/java/org/unicode/cldr/util/data/external/un_literacy.csv) | Date at top | 2012-08 | | ||
| Worldbank data | [world_bank_data.csv](https://github.com/unicode-org/cldr/blob/master/tools/java/org/unicode/cldr/util/data/external/world_bank_data.csv) | Date at bottom | 2020-12-16 | | ||
| Factbook data | [factbook_population.txt](https://github.com/unicode-org/cldr/blob/master/tools/java/org/unicode/cldr/util/data/external/factbook_population.txt) | record when downloaded in TBD | | | ||
| ISO 636 (language) data | [iso-639-3-version.tab](https://github.com/unicode-org/cldr/blob/master/tools/java/org/unicode/cldr/util/data/iso-639-3-version.tab) | Date in YYYYMMDD format | 2021-02-02 | | ||
| ISO subdivision codes | iso subdivision codes | record when downloaded in TBD | | | ||
| ISO subdivision names | iso subdivision names | record when downloaded in TBD | | | ||
| ISO currency data | iso currency data | record when downloaded in TBD | | | ||
| Timezone IDs (tzdb) | timezones (tz) | Release date on [IANA time zone DB](https://www.iana.org/time-zones) | 2021-01-24 (2021a) | | ||
| Top level domains | [tlds-alpha-by-domain.txt](https://github.com/unicode-org/cldr/blob/master/tools/java/org/unicode/cldr/util/data/tlds-alpha-by-domain.txt) | Date at top | 2021-02-17 | | ||
| Language Groups | TBD | Record when downloaded in TBD | | | ||
| UN / EU Codes | TBD | Record when downloaded in TBD | | | ||
|
||
![Unicode copyright](https://www.unicode.org/img/hb_notice.gif) |
24 changes: 24 additions & 0 deletions
24
docs/site/development/updating-codes/likelysubtags-and-default-content.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
--- | ||
title: LikelySubtags and Default Content | ||
--- | ||
|
||
# LikelySubtags and Default Content | ||
|
||
1. First make sure that you do [Update Language/Script/Region Subtags](https://cldr.unicode.org/development/updating-codes/update-languagescriptregion-subtags) first | ||
2. Run GenerateMaximalLocales with VM argument ```-DCLDR_DIR``` set to your cldr directory to generate the likely subtag data **AND** the default content locales. | ||
1. If you are trying to debug, add the VM argument ```-DGenerateMaximalLocalesDebug``` | ||
3. Input data: | ||
1. Data comes from territory/language information in supplemental data. | ||
1. However, it is supplemented by **LANGUAGE\_OVERRIDES** in GenerateMaximalLocales.java | ||
1. If there is no territory/language information in supplemental data for a language, add it to **LANGUAGE\_OVERRIDES**. | ||
2. If the mapping changes when it shouldn't (there are some special cases), add to **LANGUAGE\_OVERRIDES.** | ||
4. Output: | ||
1. Creates {CLDR\_DIR}/../Generated/cldr/supplemental/likelySubtags.xml and {CLDR\_DIR}/../Generated/cldr/supplemental/supplementalMetadata.xml | ||
2. Diff with {CLDR\_DIR}/common/supplemental/likelySubtags.xml and {CLDR\_DIR}/common/supplemental/supplementalMetadata.xml | ||
3. Be very careful to diff everything and check for errors. | ||
1. Watch especially for backwards incompatible changes; that is, changes rather than just additions. | ||
2. Look at the above to handle that with **LANGUAGE\_OVERRIDES.** | ||
4. Run tests, fix input data, and iterate as necessary. | ||
1. Copy into the svn workspace and commit. | ||
|
||
![Unicode copyright](https://www.unicode.org/img/hb_notice.gif) |
62 changes: 62 additions & 0 deletions
62
docs/site/development/updating-codes/update-currency-codes.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
--- | ||
title: Update Currency Codes | ||
--- | ||
|
||
# Update Currency Codes | ||
|
||
- Go to https://www.six-group.com/en/products-services/financial-information/data-standards.html#scrollTo=currency-codes | ||
- Take the link for "Current Currency and Funds": ["List one (XML)"](https://www.six-group.com/dam/download/financial-information/data-center/iso-currrency/amendments/lists/list_one.xml) | ||
- Save the page as {cldr}/tools/cldr\-code/src/main/resources/org/unicode/cldr/util/data/dl\_iso\_table\_a1\.xml | ||
- ```curl 'https://www.six-group.com/dam/download/financial-information/data-center/iso-currrency/lists/list_one.xml' > tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/dl_iso_table_a1.xml``` | ||
- Take the link for "Historic denominations": "[List three (XML)](https://www.six-group.com/dam/download/financial-information/data-center/iso-currrency/amendments/lists/list_three.xml)" | ||
- Save the page as {cldr}/tools/cldr\-code/src/main/resources/org/unicode/cldr/util/data/dl\_iso\_table\_a3\.xml | ||
- ```curl 'https://www.six-group.com/dam/download/financial-information/data-center/iso-currrency/lists/list_three.xml' > tools/cldr-code/src/main/resources/org/unicode/cldr/util/data/dl_iso_table_a3.xml``` | ||
- **Use git diff to sanity check the two XML files against the old, and check them in.** | ||
- **"git diff \-w" is helpful to ignore whitespace. If there are only whitespace changes, there's no need to check them in.** | ||
- **Check the** [**ISO amendments**](https://www.six-group.com/en/products-services/financial-information/data-standards.html#scrollTo=amendments) **to get changes that will happen during the current cycle.** | ||
- Example: https://www.six-group.com/dam/download/financial-information/data-center/iso-currrency/amendments/dl_currency_iso_amendment_170.pdf | ||
- It appears right now like there is no good way to collect all the amendments that are applicable, except to change "170" in the above link by incrementing until error \#404 results. So: | ||
- *Review all amendments that are dated after the previous update , and patch the XML files and the* ```supplementalData.xml``` *as below.* | ||
- *Record the last number viewed in the URL above.* | ||
- *(There is a "download all amendments" link now that has a spreadsheet summary.)* | ||
- **Record the version: See** [**Updating External Metadata**](https://cldr.unicode.org/development/updating-codes/external-version-metadata) | ||
- If there are no diffs in the two iso tables, and no relevant changes in the amendments, you are done. | ||
- Run ```CountItems -Dmethod=generateCurrencyItems``` to generate the new currency list. | ||
- If any currency is missing from ISO4217\.txt, the program will throw an exception and will print a list of items at the end that need to be added to the ISO4217\.txt file. Add as described below. | ||
- Once the necessary codes are added to ISO4217\.txt, repeat the CountItems \-Dmethod\=generateCurrencyItems until it runs cleanly. | ||
- If any country changes the use of a currency, verify that there is a corresponding entry in SupplementalData | ||
- Since ISO doesn't publish the exact date change (usually just a month), you may need to do some additional research to see if you can determine the exact date when a new currency becomes active, or when an old currency becomes inactive. If you can't find the exact date, use the last day of the month ISO publishes for an old currency expiring. | ||
- For new stuff, see below. | ||
- Adding a currency: | ||
- Make sure the new code exists in common/bcp47/currency.xml. The currency code should be in lower case, and make sure the "since" release corresponds to the next release of CLDR that will publish using this data. | ||
- In SupplementalData: | ||
- If it has unusual rounding or number of digits, add to: | ||
- \<fractions\> | ||
- \<info iso4217\="ADP" digits\="0" rounding\="0"/\> | ||
- ... | ||
- For each country in which it comes into use, add a line for when it becomes valid | ||
- \<region iso3166\="TR"\> | ||
- \<currency iso4217\="TRY" from\="2005\-01\-01"/\> | ||
- Add the code to the file java/org/unicode/cldr/util/data/ISO4217\.txt. This is important, since it is used to get the valid codes for the survey tool. | ||
- Example: | ||
- currency \| TRY \| new Turkish Lira \| TR \| TURKEY \| C | ||
- Mark the old code in java/org/unicode/cldr/util/data/ISO4217\.txt as deprecated. | ||
- currency \| TRL \| Old Turkish Lira \| TR \| TURKEY \| O | ||
- Changing currency. | ||
- If the currency goes out of use in a country, then add the last day of use, such as: | ||
- \<region iso3166\="TR"\> | ||
- \<currency iso4217\="TRL" from\="1922\-11\-01"/\> | ||
- \=\> | ||
- \<region iso3166\="TR"\> | ||
- \<currency iso4217\="TRL" from\="1922\-11\-01" to\="2005\-12\-31"/\> | ||
- Edit common/main/en.xml to add the new names (or change old ones) based on the descriptions. | ||
- If there is a collision between a new and old name, the old one typically changes to the currency name with the date range | ||
- "currency\_name (1983\-2003\)". | ||
- Check in your changes | ||
- common/bcp47/currency.xml | ||
- tools/java/org/unicode/cldr/util/data/ISO4217\.txt | ||
- common/main/en.xml | ||
- common/supplemental/supplementalData.xml | ||
- ***Note: We no longer maintain the list of currency in supplementalMetadata.xml (***[***\#4298***](http://unicode.org/cldr/trac/ticket/4298)***). The list is currently maintained by bcp47/currency.xml. We need to move the code used for checking list of ISO currency (and its numeric code mapping) currently in ICU tools repository (http://source.icu-project.org/repos/icu/tools/trunk/currency/).*** | ||
|
||
![Unicode copyright](https://www.unicode.org/img/hb_notice.gif) |
41 changes: 41 additions & 0 deletions
41
docs/site/development/updating-codes/update-language-script-info.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
--- | ||
title: Update Language Script Info | ||
--- | ||
|
||
# Update Language Script Info | ||
|
||
### Main | ||
|
||
1. https://github.com/unicode-org/cldr/tree/main/tools/cldr-code/src/main/resources/org/unicode/cldr/util/data has files with this form: | ||
1. **country\_language\_population.tsv** | ||
2. **language\_script.tsv** | ||
3. For a descriptions of the contents, see [Language Script Guidelines](https://cldr.unicode.org/development/updating-codes/update-language-script-info/language-script-description) | ||
1. Do not edit the above files with a plain text editor; they are tab\-delimited UTF\-8 with many fields and should be imported/edited with a spreadsheet editor. Excel or Google sheets should also work fine. | ||
2. The world bank, un, and factbook data should be updated as per [Updating Population, GDP, Literacy](https://cldr.unicode.org/development/updating-codes/updating-population-gdp-literacy) | ||
3. Note that there is an auxiliary file **util/data/external/other\_country\_data.txt**, which contains data that supplements the others. If there are errors below because the country population is less than the language population, then that file may need updating. | ||
1. Run the tool **ConvertLanguageData**. | ||
1. \-DADD\_POP\=**true**; for error messages. | ||
1. If there are any different country names, you'll get an error: edit external/alternate\_country\_names.txt to add them. | ||
2. Look for failures in the language vs script data, following the line: | ||
- Problems in **language\_script.tsv** | ||
3. Look for Territory Language data, following the line: | ||
- **Possible Failures ...** | ||
- In Basic Data but not Population \> 20% | ||
- and the reverse. | ||
4. Look for general problems, following the line: | ||
- **Failures in Output.** | ||
- It will also warn if a country doesn't have an official or de facto official language. | ||
5. Work until resolved. | ||
2. *The tool updates in place* **{cldrdata}/common/supplemental/supplementalData.xml** | ||
3. Carefully diff | ||
4. Then run QuickCheck to verify that the DTD is in order, and commit. | ||
|
||
### Update the supplementalData.xml \<territoryContainment\> | ||
|
||
1. For UN M.49 codes, see [Updating UN Codes](https://cldr.unicode.org/development/updating-codes/updating-un-codes) | ||
2. For the UN, go to https://www.un.org/en/member-states/index.html. Copy the table, and paste into util/data/external/un\_member\_states\_raw.txt. Diff with old. **BROKEN LINK** | ||
3. For the EU, see instructions on [Updating UN Codes](https://cldr.unicode.org/development/updating-codes/updating-un-codes) | ||
4. For the EZ, do the same with <http://ec.europa.eu/economy_finance/euro/adoption/euro_area/index_en.htm>, into util/data/external/ez\_member\_states\_raw.txt **BROKEN LINK** | ||
1. If there are changes, update \<territoryContainment\> | ||
|
||
![Unicode copyright](https://www.unicode.org/img/hb_notice.gif) |
23 changes: 23 additions & 0 deletions
23
...pment/updating-codes/update-language-script-info/language-script-description.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
--- | ||
title: Language Script Description | ||
--- | ||
|
||
# Language Script Description | ||
|
||
The language\_script spreadsheet should list all of the language / script combinations that are in common modern use. The countries are not important, since their function has been overtaken by the country\_language\_population spreadsheet. | ||
|
||
1. If the language and script are both modern, and the script is a major way to write the language in some country, then we should see that line marked as **primary**. | ||
2. Otherwise it should be marked **secondary**. | ||
|
||
Every language that is in official use in any country according to country\_language\_population should have at least one primary script in the language\_script spreadsheet. | ||
|
||
If a language has multiple primary scripts, then it should not appear without the script tag in the country\_language\_population.tsv. For example, we should not see "az", but rather "az\_Cyrl", "az\_Latn", and so on. For each country where the language is used, we should see figures on the script\-specific values. The values may overlap, that is, we may see az\_Cyrl at 60% and az\_Latn at 55%. However, the combination with the predominantly used script **must** have a larger figure than the others. | ||
|
||
This is also reflected in CLDR main: languages with multiple scripts will have that reflected in their structure (eg sr\-Cyrl\-RS), with aliases for the language\-region combinations. | ||
|
||
Files in https://github.com/unicode-org/cldr/tree/main/tools/cldr-code/src/main/resources/org/unicode/cldr/util/data | ||
|
||
1. country\_language\_population.tsv | ||
2. language\_script.tsv | ||
|
||
![Unicode copyright](https://www.unicode.org/img/hb_notice.gif) |
Oops, something went wrong.