Add oneDNN #937

graemenail · 2022-06-03T15:10:52Z

TODO

Manual check of training / scoring

Description

Replaces MKL with oneDNN. This PR should enable completely open-source releases of Marian.

This PR also includes a caching of boost, and a cleaning of the debug build directory. These can be broken off into a separate PR if necessary. During testing, windows builds would fail from running out of space building both debug and release. Since disabling oneDNN JIT profiling the build sizes are smaller.

Related: marian-nmt/marian-regression-tests#86
Closes: #706

List of changes:

Adds oneDNN as a submodule
Removes MKL related code
Use oneDNN for sgemm
Modified code paths:
- batched GEMM (cblas_sgemm_batch in prod.cpp)
- TransposeFirst3In4 (mkl_somatcopy in tensor_operators.cpp)
- Bias copy-op (mkl_somatcopy in packed_gemm.cpp)

The following have some MKL component, and have not been touched:

azure workflows
src/microsoft code
vs/*

Added dependencies: Intel oneDNN (removing Intel MKL)

How to test

Ran the 1 million sentence testset of WNGT21 through the MKL and oneDNN versions.

Using the wngt19 model from regression tests.

Setup

relative-paths: true
models:
  - wngt19/model.base.npz
vocabs:
  - wngt19/en-de.spm
  - wngt19/en-de.spm
normalize: 0.6
beam-size: 1
mini-batch: 128
maxi-batch: 100
maxi-batch-sort: src
cpu-threads: 16

Ran #cpu tagged regression tests. Non-intgemm tests pass.

Regression test results

Skipped:
- tests/models/wngt19/test_model_base_fbgemm_packed16.sh
- tests/models/wngt19/test_model_base_fbgemm_packed8.sh
- tests/server/test_ende_cpu.sh
Failed:
- tests/decoder/intgemm/test_intgemm_16bit.sh
- tests/decoder/intgemm/test_intgemm_16bit_avx2.sh
- tests/decoder/intgemm/test_intgemm_16bit_sse2.sh
- tests/decoder/intgemm/test_intgemm_8bit.sh
- tests/decoder/intgemm/test_intgemm_8bit_avx2.sh
- tests/decoder/intgemm/test_intgemm_8bit_ssse3.sh
Logs:
- /home/gnail/dev/marian-dev/regression-tests/tests/decoder/intgemm/test_intgemm_16bit.sh.log
- /home/gnail/dev/marian-dev/regression-tests/tests/decoder/intgemm/test_intgemm_16bit_avx2.sh.log
- /home/gnail/dev/marian-dev/regression-tests/tests/decoder/intgemm/test_intgemm_16bit_sse2.sh.log
- /home/gnail/dev/marian-dev/regression-tests/tests/decoder/intgemm/test_intgemm_8bit.sh.log
- /home/gnail/dev/marian-dev/regression-tests/tests/decoder/intgemm/test_intgemm_8bit_avx2.sh.log
- /home/gnail/dev/marian-dev/regression-tests/tests/decoder/intgemm/test_intgemm_8bit_ssse3.sh.log
---------------------
Ran 19 tests in 00:03:6.267s, 10 passed, 3 skipped, 6 failed
FAILED

The static binaries are now larger (merge #938 first for comparison).

Checklist

I have tested the code manually
I have run regression tests
I have read and followed CONTRIBUTING.md
I have updated CHANGELOG.md

…ard)

snukky · 2022-09-12T09:30:24Z

I merged this with the recent master (https://github.com/marian-nmt/marian-dev/tree/graemenail-onednn) and tested with the regression tests that we have. Intgemm tests and one wngt19 8bit test fail, but I guess this is expected and in fact can be also because of #933. In our tests, except tests with quantized 8bit models tested on CPU, it also throws the abort here. Other tests pass.

As for other comments, as discussed earlier, instead of replacing MKL with oneDNN, we would like to have this backward compatible at least for some time and to keep the previous behavior as default. So oneDNN would be enabled via CMake flag and MKL is used by default if detected.

graemenail · 2022-09-12T10:11:55Z

Thanks for the comments @snukky. On the LSH abort, it's a while since I looked at this, I don't believe oneDNN provides all the BLAS functions required. From a quick check, I'd guess this #706 (comment) is still the status of non-GEMM BLAS in oneDNN.

I had started on the oneDNN + MKL implementation, I'll dig it out.

graemenail · 2022-09-14T09:59:22Z

Closing in favour of #967

graemenail added 23 commits June 3, 2022 15:45

Add oneDNN submodule to 3rd_party

87b8d53

Add oneDNN to CMake

e72a0dd

Don't build DNNL examples

d341447

Allow static builds of DNNL

72e0b1a

Remove MKL include from config parser

641a816

Remove MKL from microsoft/quicksand

1e8555f

Remove MKL omatcopy from FBGEMM packed_gemm

2b44b3f

Remove MKL from GitHub Actions

bdc02f6

Remove MKL call from 3in4 transpose, reverting to plain copy (on forw…

49140fc

…ard)

Replace MKL sgemm with oneDNN sgemm

9da225e

Remove MKL from CMake and improve oneDNN CMake

348fa95

Remove MKL guards in prod. This also removes sgemm_batched

6e60006

Use int in loop for ProdBatched

5879c20

oneDNN only use OMP runtime when specified

586d20c

Move MSVC unicode flags out of global flags

3a5d3f3

Disable DNNL JIT Profiling

bfe7146

Cache Boost

e1bba93

Clean up after debug build

201321b

Remove FindMKL.cmake

58a1a44

Update Changelog

2b88d54

Remove MKL from release workflow

4e3572d

Replace MKL reference in docs

8852700

Fix comments mentioning MKL

ce859c4

graemenail mentioned this pull request Jun 3, 2022

Directory listing in Ubuntu and macOS workflows #938

Merged

4 tasks

graemenail mentioned this pull request Sep 14, 2022

oneDNN + MKL #967

Open

4 tasks

graemenail closed this Sep 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add oneDNN #937

Add oneDNN #937

graemenail commented Jun 3, 2022 •

edited

Loading

snukky commented Sep 12, 2022

graemenail commented Sep 12, 2022

graemenail commented Sep 14, 2022

Add oneDNN #937

Add oneDNN #937

Conversation

graemenail commented Jun 3, 2022 • edited Loading

TODO

Description

How to test

Checklist

snukky commented Sep 12, 2022

graemenail commented Sep 12, 2022

graemenail commented Sep 14, 2022

graemenail commented Jun 3, 2022 •

edited

Loading