Skip to content

Release v0.10.0-beta

Compare
Choose a tag to compare
@enmortensen enmortensen released this 01 Aug 22:18
· 17 commits to main since this release
669197a

v0.10.0-beta

UPDATE 08/08:

With CV-CUDA v0.10.0-beta release, we introduced a bug in the optimizations of color conversions (the R and B channels are swapped in 'YUV to BGR' and 'BGR to YUV', see for instance cvt_color.cu).

We recommend using CV-CUDA v0.10.1-beta that reverts these optimizations. These optimizations will be reintroduced, with consolidated testing, in a future release.

Release Highlights

CV-CUDA v0.10.0 includes a critical bug fix (cache growth management) alongside the following changes:

  • New Features:
    • Added mechanism to limit and manage cache memory consumption (includes new "Best Practices" documentation).
    • Performance improvements of color conversion operators (e.g., 2x faster RGB2YUV). Known bug, see issue
    • Refactored codebase to allow independent build of NVCV library (data structures).
  • Bug Fixes:
    • Fixed unbounded cache memory consumption issue.
    • Improved management of Python-created object lifetimes, decoupled from cache management.
    • Fixed potential crash in Resize operator's linear and nearest neighbor interpolation from non-aligned vectorized writes.
    • Fixed Python CvtColor operator to correctly handle NV12 and NV21 outputs.
    • Fixed Resize and RandomResizedCrop linear interpolation weight for border rows and columns.
    • Fixed missing parameter in C API for fused ResizeCropConvertReformat.
    • Fixed several minor documentation and error output issues.
    • Fixed minor compiler warning while building Resize operator.

Compatibility and Known Limitations

  • New limitations:
    • Cache/resource management introduced in v0.10 add micro-second-level overhead to Python operator calls. Based on the performance analysis of our Python samples, we expect the production- and pipeline-level impact to be negligible. CUDA kernel and C++ call performance is not affected. We aim to investigate and reduce this overhead further in a future release.​
    • Sporadic Pybind11-deallocation crashes have been reported in long-lasting multi-threaded Python pipelines with externally allocated memory (eg wrapped Pytorch buffers). We are evaluating an upgrade of Pybind11 (currently using 2.10) as a potential fix in an upcoming release.
    • Erroneous BGR -> YUV and YUV -> BGR color conversions, see issue

For the full list, see main README on CV-CUDA GitHub.

License

CV-CUDA is licensed under the Apache 2.0 license.

Resources

  1. CV-CUDA GitHub
  2. [CV-CUDA Increasing Throughput and Reducing Costs for AI-Based Computer Vision with CV-CUDA] https://developer.nvidia.com/blog/increasing-throughput-and-reducing-costs-for-computer-vision-with-cv-cuda/)
  3. [NVIDIA Announces Microsoft, Tencent, Baidu Adopting CV-CUDA for Computer Vision AI] https://blogs.nvidia.com/blog/2023/03/21/cv-cuda-ai-computer-vision/)
  4. CV-CUDA helps Tencent Cloud audio and video PaaS platform achieve full-process GPU acceleration for video enhancement AI

Acknowledgements

CV-CUDA is developed jointly by NVIDIA and the ByteDance Machine Learning team.