Merge with internal master - 2024-08-05 #1026

emjotde · 2024-08-05T19:02:06Z

Merges withe MS-internal master branch. Full sync

Fix docker url security: use microsoft container registry instead of public dockerhub

…able specific jobs

This PR adds a first working version of ALIBI with algorithmic shifts for encoder-decoder models. Also adds trainable ALIBI slopes and biases and ALIBI in general to the **new** layer framework. This is still experimental.

This adds nucleus and epsilon sampling to the output-sampling options. * This required the implementation of a sorting algorithm, tested thrust and CUB. * Implementation of cumsum and logcumsumexp (no gradient for now) operators. * Various minor improvements.

This adjusts the logmask computation to match the implementation in COMET-QE model after the ALIBI refactoring.

This adds a sparsemax function and support for COMET-22 ref-based metric. Worth adding a regression test for Unbabel/wmt22-comet-da model later. Scores seem to be pretty much identical to PyTorch implementation when running as float32.

This is a rewrite of the graph loading and memory-mapping functionality. We now mmap and share oportunistically, i.e. whenever it is possible: * with cpu-decoding and *.bin files everything will be automatically mmapped * with *.npz files the model will be read only once. * on the GPU *.bin will be mmapped but still copied to GPU, ideally omitting CPU memory. This quite drastically reduces unnecessary CPU memory overhead and loading time for things like COMET scoring.

This PR implements * Comet-Kiwi - fully functional * xComet-XL and xComet-XXL - scores for regressor part fully matching, MQM partial scores not implemented yet.

Fixes small bug for mt-detect models

* This code is same as [public github repo tg/pybind-new branch](#1013). Git histories seems slightly different between public and private repo so we are seeing a lot of commits * This builds on top of work by Elijah #948

…marian-evaluate This PR add minor fixes to pybindings and pymarian-evaluate: * comet2marian.py script correctly handles the wmt23-cometkiwi-da-xl/xxl models. * pymarian-evaluate now correctly computes scores * evaluator now exposes an interface function to read the model config

…or Ampere and Turing Ubuntu CI: ON to Maxwell, Pascal and Volta; OFF to Ampere and Turing * to fix space issue on CI vms

…dels This PR implements a bunch of missing functionality in the new layer framework. Among others: * Autoregressive self-attention * Guided alignment training * Decode-time alignment Minor refactoring of previous code to accommodate above changes. When setting `export TRANSFORMER_FLAVOR=experimental` all legacy transformer models are internally mapped to the new layer framework. With that enabled: Production regression tests all pass. Passes all public regression tests with the exception of: - tests/factors/test_factors_concat.sh - tests/factors/test_factors_decoder_concat.sh - tests/models/wnmt18/test_student_small_aan.sh - tests/models/wnmt18/test_student_small_aan_intgemm16.sh - tests/models/wnmt18/test_student_small_aan_intgemm8.sh and - tests/interface/input-tsv/test_tsv_train_with_align_and_weights.sh - tests/interface/input-tsv/test_tsv_train_with_align_and_weights_inputtypes.sh I could get these to work, but it doesn't seem to be worth it. I plan to remove both code paths in the future. The last two are -- I think -- just divergences due to mild model differences and probably don't need fixing, rather future adaptation.

This PR adds `--input-reorder` which allows to swap the indices of batch subfields. Currently, this is used for comet-kiwi-style models to accomodate that the mt output comes first and not the source.

It seems there was a shape mismatch for force-decoding with beams larger than 1. This PR fixes the problem.

List of changes/updates/fixes to pymarian * Rename model IDs to match with hugging face (e.g., comet22-da -> wmt22-comet-da) * Rename CLI to make it short pymarian-evaluate -> pymarian-eval. * Rename pymarian.evaluate.py -> pymarian.eval.py to reflect CLI * The functional code from pymarian.eval.py is moved to Evaluator class (goal: allow reuse of Evaluator object for scoring many small files like WMT metric task) * Use mmap *.bins instead of *.npz * Downloads *.bin and *.spm individually instead of .tgz. Future plan to support quantized / gemm models. Downloading .tgz is okay but it will get too expensive since we dont need all variants of model (.npz, .bin, fp32, fp16, avx512 ...) * Uses file locking mechanism (based on `portalocker`) to avoid race condition between parallel download processes * Added optional `-v/--vocab` argument to pymarian-eval. * Added `--fields|-f` argument: supports `src mt ref` or a subsequence of this. Raises an error when missing fields are detected, ignores that extra fields * pymarian build improvements: strict on python version match between package and native extension. Also removes custom logic for extension detection, instead uses EXT_SUFFIX from sysconfig * add `--like` argument for local models * Ran black and isort to fix code formatting issues * pypdl -- parallel download * Regression tests to pymarian -- Other scripts * Added `convert-all-models.sh` : convert pytorch to marian .npz, convert .npz to .bin and creates directory structure compatible with pymarian-eval * Added `compare.sh` to compare metrics between original implementation and pymarian

This mostly adds @<Varun Mathur>'s changes from public master to internal. I did an automatic merge and need to go through those changes myself. I think there is an issue in translator.h which I will fix. @<Varun Mathur> can you check if things work for you here?

support force-decoding for pymarian Translator API

Cuda seems to have deprecated a whole bunch of its interface and it seems to interact weirdly with some gcc versions. Disabling warnings for this header via dummy include.

This PR adds a simple `--no-optimizer-reload` that allows to skip restoring optimizer state during continued training or divergence fallback.

This PR includes various fixes to the force decoding code to make the LSH and beam search work.

Abort or throw an exception if we try force-decoding with a factored Vocab.

…ed decoding * Fixes regressions in new layer framework for ALIBI-based decoding

* Do not mmap files for conversion

…e tcmalloc; huggingface backed for gated COMETs pymarian upgrades * Support for build for multiple python versions at once; borrowed a cmake script from AMD * use "build" instead of "pip wheel"; build is more stable and leaves less junk on file system * Disable tcmalloc for pymarian * Added support for [huggingface backend](https://huggingface.co/collections/Unbabel/marian-comet-metrics-and-qe-664e28c82743db6709d022fc). Currently enabled for gated comet models only. * Added `--cache` argument to pymarian-eval CLI; Useful for accessing cache from blobstorage mount path for gated models

snukky · 2024-08-07T16:51:14Z

The workflow fails for the current master as well, so this is unlikely related to the sync. I think we can merge. Let's make sure it's not a squash merge to preserve the commit history.

Thamme Gowda and others added 29 commits November 13, 2023 11:49

Merged PR 31742: Fix docker url security: use microsoft cr

a728daa

Fix docker url security: use microsoft container registry instead of public dockerhub

Merged PR 31906: Updates to CI pipeline: new vcpkg and options to dis…

6fe9a80

…able specific jobs

Merged PR 31918: Update MKL in GPU regression tests

72c8d60

Merged PR 31730: ALIBI with shifts

a7cc324

This PR adds a first working version of ALIBI with algorithmic shifts for encoder-decoder models. Also adds trainable ALIBI slopes and biases and ALIBI in general to the **new** layer framework. This is still experimental.

Merged PR 32433: Fix Logmask in BLEURT model

5e47ab2

This adjusts the logmask computation to match the implementation in COMET-QE model after the ALIBI refactoring.

Merged PR 32600: Full Comet-Kiwi implementation, partial xComet-XL/XXL

1656b9c

This PR implements * Comet-Kiwi - fully functional * xComet-XL and xComet-XXL - scores for regressor part fully matching, MQM partial scores not implemented yet.

Merged PR 32781: Attach missing node for mt-detect models

b5c892e

Fixes small bug for mt-detect models

Merged PR 31744: Pymarian: python bindings to marian

1c63c1e

* This code is same as [public github repo tg/pybind-new branch](#1013). Git histories seems slightly different between public and private repo so we are seeing a lot of commits * This builds on top of work by Elijah #948

Merged PR 32860: Azure CI: save disk space by disabling compilation f…

4cdf93a

…or Ampere and Turing Ubuntu CI: ON to Maxwell, Pascal and Volta; OFF to Ampere and Turing * to fix space issue on CI vms

Merged PR 32882: Reorder inputs for kiwi-style metrics

b683f4b

This PR adds `--input-reorder` which allows to swap the indices of batch subfields. Currently, this is used for comet-kiwi-style models to accomodate that the mt output comes first and not the source.

Merged PR 32937: Fixes force-decoding for beam-size larger 1

22ed792

It seems there was a shape mismatch for force-decoding with beams larger than 1. This PR fixes the problem.

Merged PR 33010: support force-decoding for pymarian Translator API

01bc6b0

support force-decoding for pymarian Translator API

Merged PR 33382: handle cusparse deprecation warnings with cuda 12.3

4d184bb

Cuda seems to have deprecated a whole bunch of its interface and it seems to interact weirdly with some gcc versions. Disabling warnings for this header via dummy include.

Merged PR 33692: Add --no-optimizer-reload option

00ff086

This PR adds a simple `--no-optimizer-reload` that allows to skip restoring optimizer state during continued training or divergence fallback.

Merged PR 33803: Fixes to force-decoding to enable LSH

58a9150

This PR includes various fixes to the force decoding code to make the LSH and beam search work.

Merged PR 34062: Add exception if force-decoding is used for FSM vocab

b4ed630

Abort or throw an exception if we try force-decoding with a factored Vocab.

Merged PR 34029: Fix regressions in new layer framework for ALIBI-bas…

2745b77

…ed decoding * Fixes regressions in new layer framework for ALIBI-based decoding

Merged PR 34167: Do not mmap files for conversion in Quicksand API

07042cf

* Do not mmap files for conversion

merge with internal master

2f9b6df

temporarily disable pymarian compilation

be50e88

Temporarily disable PyMarian and examples in GitHub workflows

deb387e

snukky approved these changes Aug 7, 2024

View reviewed changes

emjotde closed this Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge with internal master - 2024-08-05 #1026

Merge with internal master - 2024-08-05 #1026

emjotde commented Aug 5, 2024

snukky commented Aug 7, 2024

Merge with internal master - 2024-08-05 #1026

Merge with internal master - 2024-08-05 #1026

Conversation

emjotde commented Aug 5, 2024

snukky commented Aug 7, 2024