Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returns translated sentences immediately in beam search #858

Open
wants to merge 27 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
a45c848
Update marian-backend
Jan 20, 2021
67dfe27
Adds new model task to create tasks with callbacks
rhenry-nv Jan 21, 2021
6a2c1a6
Adds support for beam search to take in a callback to run on finished…
rhenry-nv Jan 21, 2021
ee73446
Adds support for translators to use callbacks within beamsearch
rhenry-nv Jan 21, 2021
0772459
Adds new file containing callbacks for debugging and timing
rhenry-nv Jan 21, 2021
f6794db
Add timing functor
rhenry-nv Jan 21, 2021
b1b7757
Timing for individual sentences working with model. Need to check cor…
rhenry-nv Jan 22, 2021
1876004
Adds temporary timing code
rhenry-nv Jan 22, 2021
9fb0158
Merge remote-tracking branch 'public/master' into triton_update
rhenry-nv Mar 11, 2021
dfbedaa
Merge branch 'triton_update' into async
rhenry-nv Apr 19, 2021
4517215
Adds sync support into beam search and marian triton backend. The tri…
rhenry-nv Apr 19, 2021
9ca8666
Updates docker to pull from async branch
rhenry-nv Mar 11, 2021
6ec9abd
Triton backend with async backend now compiles. Need to test
rhenry-nv Mar 12, 2021
dbe7a2a
Attempt 1 at setting async mode in config.pbtxt
rhenry-nv Mar 25, 2021
207fa0c
Fix async build
rhenry-nv Mar 25, 2021
b1541e0
adds cmarian.cpp to static build
rhenry-nv Mar 25, 2021
46acae1
Fixes linking issue in build with new version of Marian. Pass callbac…
rhenry-nv Mar 26, 2021
41a3925
Remove print
rhenry-nv Mar 26, 2021
f52bca3
Builds backend against CUDA 11
rhenry-nv Mar 26, 2021
a52f1fe
Build for Volta and Turing by default
rhenry-nv Mar 30, 2021
5de8a5e
Install tcmalloc
rhenry-nv Apr 7, 2021
22d15b0
Updates contrib readme
rhenry-nv Apr 19, 2021
cfde994
Removes artifacts from using std::function for callback
rhenry-nv Apr 19, 2021
9e1aac3
Fixes compile issues
rhenry-nv Apr 20, 2021
c911653
Update change log
rhenry-nv Apr 20, 2021
bf12cbc
Fix windows compile error
rhenry-nv Apr 20, 2021
8c82968
Modified Dockerfile to pull from master on marian repo
rhenry-nv Apr 20, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
## [Unreleased]

### Added
- Add support for returning sentences as soon as translation is done in beam search.
- Support for RMSNorm as drop-in replace for LayerNorm from `Biao Zhang; Rico Sennrich (2019). Root Mean Square Layer Normalization`. Enabled in Transformer model via `--transformer-postprocess dar` instead of `dan`.
- Extend suppression of unwanted output symbols, specifically "\n" from default vocabulary if generated by SentencePiece with byte-fallback. Deactivates with --allow-special
- Allow for fine-grained CPU intrinsics overrides when BUILD_ARCH != native e.g. -DBUILD_ARCH=x86-64 -DCOMPILE_AVX512=off
Expand Down
15 changes: 9 additions & 6 deletions contrib/triton-aml/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# It is recommended to use a machine which supports CUDA to build this image.
FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04 AS BUILDER
FROM nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04 AS BUILDER
RUN apt-get update --fix-missing
RUN apt-get install -y curl git autoconf automake libtool curl make g++ unzip cmake build-essential cpio
RUN apt-get install -y curl git autoconf automake libtool curl make g++ unzip cmake build-essential cpio libgoogle-perftools-dev
RUN apt-get -y clean && \
rm -rf /var/lib/apt/lists/*

Expand Down Expand Up @@ -42,10 +42,9 @@ RUN ./b2 install --prefix=/usr --with-system --with-thread --with-date_time --wi

# Marian install
WORKDIR /
RUN git clone --no-checkout https://github.com/marian-nmt/marian-dev
RUN git clone --no-checkout https://github.com/marian-nmt/marian-dev.git
WORKDIR marian-dev
RUN git checkout youki/quantize-embedding
RUN git checkout dad48865fd3b7f1d7b891de81040f7651e824510
RUN git checkout master
RUN mkdir src/static
RUN mkdir build
COPY src/cmarian.cpp /marian-dev/src/static
Expand All @@ -54,7 +53,10 @@ RUN rm src/CMakeLists.txt
COPY src/CMakeLists.txt /marian-dev/src

WORKDIR /marian-dev/build
RUN cmake .. -DCOMPILE_CPU=on -DCOMPILE_CUDA=on -DUSE_SENTENCEPIECE=on -DUSE_STATIC_LIBS=off -DCOMPILE_SERVER=off -DUSE_FBGEMM=on -DCUDA_cublas_device_LIBRARY=/usr/lib/x86_64-linux-gnu/libcublas.so
RUN cmake .. -DCOMPILE_CPU=on -DCOMPILE_CUDA=on -DUSE_SENTENCEPIECE=on -DUSE_STATIC_LIBS=off -DCOMPILE_SERVER=off -DUSE_FBGEMM=on \
-DCOMPILE_CUDA_SM35=off -DCOMPILE_CUDA_SM50=off -DCOMPILE_CUDA_SM60=off -DCOMPILE_CUDA_SM70=on -DCOMPILE_CUDA_SM75=on \
-DCUDA_cublas_device_LIBRARY=/usr/local/cuda/lib64/libcublas.so

RUN make -j $(grep -c ^processor /proc/cpuinfo)

# build cmarian static library
Expand All @@ -66,6 +68,7 @@ COPY --from=BUILDER /marian-dev/build/src/3rd_party/fbgemm/libfbgemm.a /usr/lib
COPY --from=BUILDER /marian-dev/build/src/3rd_party/fbgemm/asmjit/libasmjit.a /usr/lib
COPY --from=BUILDER /marian-dev/build/src/3rd_party/sentencepiece/src/libsentencepiece_train.a /usr/lib
COPY --from=BUILDER /marian-dev/build/src/3rd_party/sentencepiece/src/libsentencepiece.a /usr/lib
COPY --from=BUILDER /marian-dev/build/src/3rd_party/intgemm/libintgemm.a /usr/lib
COPY --from=BUILDER /marian-dev/build/libmarian.a /usr/lib/libcmarian.a
COPY --from=BUILDER /marian-dev/build/src/libmarian_cuda.a /usr/lib/libcmarian_cuda.a

Expand Down
11 changes: 10 additions & 1 deletion contrib/triton-aml/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,18 @@ For the AzureML Inference team members, you can put it into the following place

Where <backend_directory> is by default /opt/tritonserver/backends.

This backend will return sentences as soon as they are done with translation by default. To only return when the
entire batch is finished translating, set the async_mode to false by adding the following your config.pbtxt file.

parameters [
{
key: "async"
value: { string_value : "false" }
}
]
## Make changes

If you want to compile with another version of Marian, you need to replace `RUN git checkout youki/quantize-embedding` in the Dockerfile, then copy the new CMakeLists.txt replace the old one, add src/cmarian.cpp into CMakeLists.txt and make some changes to make sure it will build a static library of Marian.
If you want to compile with another version of Marian, you need to replace `RUN git checkout async` in the Dockerfile, then copy the new CMakeLists.txt replace the old one, add src/cmarian.cpp into CMakeLists.txt and make some changes to make sure it will build a static library of Marian.

## Limitation

Expand Down
1 change: 1 addition & 0 deletions contrib/triton-aml/marian_backend/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@ target_link_libraries(
fbgemm
asmjit
protobuf
intgemm
)


Expand Down
Loading