forked from Tencent/ncnn
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NCNN RVV int8 and fp16sa optimization #2
Open
Xinyu302
wants to merge
67
commits into
rv2036:master
Choose a base branch
from
Xinyu302:dev
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…tn_mask detection (Tencent#5273)
Signed-off-by: Molly Sophia <[email protected]>
* fix sigbus error when loading fp16 model on armv7 * apply for bf16
Bumps [actions/cache](https://github.com/actions/cache) from 3 to 4. - [Release notes](https://github.com/actions/cache/releases) - [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md) - [Commits](actions/cache@v3...v4) --- updated-dependencies: - dependency-name: actions/cache dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: hugo-syn <[email protected]>
Signed-off-by: hugo-syn <[email protected]>
* pnnx handle two operands add/sub/rsub variant * fuse dynamic slice indexes, wip * pnnx sliceindexes * reset device may change non-dtype input numeric 5 to 6 * print inf as float * preserve dtype for generation op * pnnx convert torch.masked_select * test masked_select * test negative slice
Signed-off-by: Xilin Wu <[email protected]>
* replace storexxx to vsseg2e32_v_f32m1 * refine transpose --------- Co-authored-by: Xinyu302 <[email protected]>
Signed-off-by: Molly Sophia <[email protected]>
* promote vfpv4 for auto fp16 storage conversion * always report neon and vfpv4 for arm64
* add basic shufflechannel * finish but bug * fix bug * apply code-format changes --------- Co-authored-by: Xinyu302 <[email protected]>
* convd 3x3 packn asm * apply code-format changes --------- Co-authored-by: fty1777 <[email protected]>
* add float2int8 * pass compile * apply code-format changes * add float2int8leakyrelu * add quantize_riscv but has bug * fix bug * fix bug of float2int8relu * apply code-format changes * Add quantize fp16 (#6) * finish quantize fp16 to int8 * fix quantize bug * apply code-format changes --------- Co-authored-by: Xinyu302 <[email protected]> * Add dequantize (#3) * add-deq * apply code-format changes * ongoing dequantize * apply code-format changes * finish dequantize_riscv, but has bug * fix vset bug * fix bug * delete debug info * apply code-format changes * refine dequantize --------- Co-authored-by: Xinyu302 <[email protected]> * Add innerproduct (#5) * add int innerproduct * apply code-format changes * finish innerproduct, but not tested yet * apply code-format changes * bug because no fp16 quantize * change flatten to make it right * pass test * delete useless code * apply code-format changes * delete useless code --------- Co-authored-by: Xinyu302 <[email protected]> * copy arm convolutiondepthwise to convolutiondepthwise_riscv.cpp * change int8 packn to 8 (#7) * change int8 packn to 8 * test_convert packing right * modify header (#8) * finish convolutiondepthwise_3x3_pack8_int8 * apply code-format changes * delete comment * Fix pack (#10) * modify * debug * fix quantize and innerproduct * apply code-format changes --------- Co-authored-by: Xinyu302 <[email protected]> * pack8 maybe right * apply code-format changes * debug * debug * modify padding using pack8 (#12) * apply code-format changes * use pack8 * apply code-format changes * pack8 right * add basic conv * apply code-format changes * finish requantize but has bug (#13) * finish requantize but has bug * fix bug * delete comment * apply code-format changes --------- Co-authored-by: Xinyu302 <[email protected]> * now can use requantize * delete comment * add arm base, now to rewrite it to riscv-v extension * apply code-format changes * try to finish * apply code-format changes * try to add pack8 * try to handle vpadalq_s16 * apply code-format changes * finish kernel. pass test * use new kernel * fix kernel bug * pass test * apply code-format changes * fix net.cpp layer pack * fix segfault bug * add fp16 dequantize_riscv.cpp * use same elesize * remove comment * delete comment * apply code-format changes * dequantize fp16sa * apply code-format changes * WIP: buggy int8 packn * WIP: maybe fixed * apply code-format changes * fix depthwise conv bug --------- Co-authored-by: Xinyu302 <[email protected]> Co-authored-by: fty1777 <[email protected]>
* reorder inst * convd 3x3 pack1 * apply code-format changes --------- Co-authored-by: fty1777 <[email protected]>
* WIP: conv wino int8 * change f16 to i16 * top_blob_tm create 4u * packn * WIP: conv 3x3 winograd transform input(1/2)/output(0/2)/kernel(2/2done) * finish winograd23 int8 transform * WIP: conv 3x3 winograd transform input(1/2)/output(1/2)/kernel(2/2done) * apply code-format changes * fix bug in convolution_winograd_dot_packn_int8.h * can compile, not test yet * use winograd transform kernel * winograd23 result divide 2, now can pass test * apply code-format changes * winograd23 riscv int8 opt * conv winograd43 riscv int8 --------- Co-authored-by: fty1777 <[email protected]> Co-authored-by: fty1777 <[email protected]> Co-authored-by: Xinyu302 <[email protected]>
* conv 3x3 pack1ton * apply code-format changes --------- Co-authored-by: fty1777 <[email protected]> Co-authored-by: fty1777 <[email protected]>
尊敬的参赛选手,您好。 |
确认无误 |
感谢您的回复。 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
复现环境
测试情况
优化中加入了对Conv的Winograd优化,但是由于该优化会使得模型weight变大,部分模型效果不好,因此修改了benchncnn源码,控制模型是否使用Winograd优化.
此外,除官方说明的可以在55MB下运行的模型外,本优化可以使squeezenet_ssd_int8模型正常运行。
运行命令:
模型运行时间: