Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge with internal master - 2024-08-05 #1026

Closed
wants to merge 29 commits into from
Closed

Conversation

emjotde
Copy link
Member

@emjotde emjotde commented Aug 5, 2024

Merges withe MS-internal master branch. Full sync

Thamme Gowda and others added 29 commits November 13, 2023 11:49
Fix docker url security: use microsoft container registry instead of public dockerhub
This PR adds a first working version of ALIBI with algorithmic shifts for encoder-decoder models. Also adds trainable ALIBI slopes and biases and ALIBI in general to the **new** layer framework. This is still experimental.
This adds nucleus and epsilon sampling to the output-sampling options.
* This required the implementation of a sorting algorithm, tested thrust and CUB.
* Implementation of cumsum and logcumsumexp (no gradient for now) operators.
* Various minor improvements.
This adjusts the logmask computation to match the implementation in COMET-QE model after the ALIBI refactoring.
This adds a sparsemax function and support for COMET-22 ref-based metric.

Worth adding a regression test for Unbabel/wmt22-comet-da model later. Scores seem to be pretty much identical to PyTorch implementation when running as float32.
This is a rewrite of the graph loading and memory-mapping functionality. We now mmap and share oportunistically, i.e. whenever it is possible:
* with cpu-decoding and *.bin files everything will be automatically mmapped
* with *.npz files the model will be read only once.
* on the GPU *.bin will be mmapped but still copied to GPU, ideally omitting CPU memory.

This quite drastically reduces unnecessary CPU memory overhead and loading time for things like COMET scoring.
This PR implements
* Comet-Kiwi - fully functional
* xComet-XL and xComet-XXL - scores for regressor part fully matching, MQM partial scores not implemented yet.
* This code is same as  [public github repo tg/pybind-new branch](#1013). Git histories seems slightly different between public and private repo so we are seeing a lot of commits
* This builds on top of work by Elijah #948
…marian-evaluate

This PR add minor fixes to pybindings and pymarian-evaluate:
* comet2marian.py script correctly handles the wmt23-cometkiwi-da-xl/xxl models.
* pymarian-evaluate now correctly computes scores
* evaluator now exposes an interface function to read the model config
…or Ampere and Turing

Ubuntu CI: ON to Maxwell, Pascal and Volta; OFF to Ampere and Turing

* to fix space issue on CI vms
…dels

This PR implements a bunch of missing functionality in the new layer framework. Among others:

* Autoregressive self-attention
* Guided alignment training
* Decode-time alignment

Minor refactoring of previous code to accommodate above changes.

When setting `export TRANSFORMER_FLAVOR=experimental` all legacy transformer models are internally mapped to the new layer framework. With that enabled:

Production regression tests all pass.

Passes all public regression tests with the exception of:

- tests/factors/test_factors_concat.sh
- tests/factors/test_factors_decoder_concat.sh
- tests/models/wnmt18/test_student_small_aan.sh
- tests/models/wnmt18/test_student_small_aan_intgemm16.sh
- tests/models/wnmt18/test_student_small_aan_intgemm8.sh

and

- tests/interface/input-tsv/test_tsv_train_with_align_and_weights.sh
- tests/interface/input-tsv/test_tsv_train_with_align_and_weights_inputtypes.sh

I could get these to work, but it doesn't seem to be worth it. I plan to remove both code paths in the future. The last two are -- I think -- just divergences due to mild model differences and probably don't need fixing, rather future adaptation.
This PR adds `--input-reorder`  which allows to swap the indices of batch subfields. Currently, this is used for comet-kiwi-style models to accomodate that the mt output comes first and not the source.
It seems there was a shape mismatch for force-decoding with beams larger than 1. This PR fixes the problem.
List of changes/updates/fixes to pymarian
* Rename model IDs to match with hugging face (e.g., comet22-da -> wmt22-comet-da)
* Rename CLI to make it short pymarian-evaluate -> pymarian-eval.
* Rename pymarian.evaluate.py -> pymarian.eval.py to reflect CLI
* The functional code from pymarian.eval.py is moved to Evaluator class (goal: allow reuse of Evaluator object for scoring many small files like WMT metric task)
* Use mmap *.bins instead of *.npz
* Downloads *.bin and *.spm individually instead of .tgz. Future plan to support quantized / gemm models. Downloading .tgz is okay but it will get too expensive since we dont need all variants of model (.npz, .bin, fp32, fp16, avx512 ...)
* Uses file locking mechanism (based on `portalocker`) to avoid race condition between parallel download processes
* Added optional `-v/--vocab` argument to pymarian-eval.
* Added `--fields|-f` argument: supports `src mt ref` or a subsequence of this. Raises an error when missing fields are detected, ignores that extra fields
* pymarian build improvements: strict on python version match between package and native extension. Also removes custom logic for extension detection, instead uses EXT_SUFFIX from sysconfig
* add `--like` argument for local models
* Ran black and isort to fix code formatting issues
* pypdl -- parallel download
* Regression tests to pymarian

--

Other scripts
* Added `convert-all-models.sh` : convert pytorch to marian .npz, convert .npz to .bin and creates directory structure compatible with pymarian-eval
* Added `compare.sh` to compare metrics between original implementation and pymarian
This mostly adds @<Varun Mathur>'s changes from public master to internal. I did an automatic merge and need to go through those changes myself. I think there is an issue in translator.h which I will fix.

@<Varun Mathur> can you check if things work for you here?
support force-decoding for pymarian Translator API
Cuda seems to have deprecated a whole bunch of its interface and it seems to interact weirdly with some gcc versions. Disabling warnings for this header via dummy include.
This PR adds a simple `--no-optimizer-reload` that allows to skip restoring optimizer state during continued training or divergence fallback.
This PR includes various fixes to the force decoding code to make the LSH and beam search work.
Abort or throw an exception if we try force-decoding with a factored Vocab.
…ed decoding

* Fixes regressions in new layer framework for ALIBI-based decoding
…e tcmalloc; huggingface backed for gated COMETs

pymarian upgrades
* Support for build for multiple python versions at once;  borrowed a cmake script from AMD
* use "build" instead of "pip wheel"; build is more stable and leaves less junk on file system
* Disable tcmalloc for pymarian
* Added support for [huggingface backend](https://huggingface.co/collections/Unbabel/marian-comet-metrics-and-qe-664e28c82743db6709d022fc). Currently enabled for gated comet models only.
* Added `--cache` argument to pymarian-eval CLI; Useful for accessing cache from blobstorage mount path for gated models
@snukky
Copy link
Member

snukky commented Aug 7, 2024

The workflow fails for the current master as well, so this is unlikely related to the sync. I think we can merge. Let's make sure it's not a squash merge to preserve the commit history.

@emjotde emjotde closed this Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants