-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature](mluOpExecFFT): FFT operator completion #1045
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
aokbok
reviewed
Jun 12, 2024
PetrelYy
reviewed
Jun 12, 2024
PetrelYy
reviewed
Jun 12, 2024
aokbok
reviewed
Jun 12, 2024
PetrelYy
reviewed
Jun 12, 2024
PetrelYy
reviewed
Jun 12, 2024
test/mlu_op_gtest/pb_gtest/src/zoo/fft/test_case/fft_0.prototxt
Outdated
Show resolved
Hide resolved
PetrelYy
reviewed
Jun 12, 2024
test/mlu_op_gtest/pb_gtest/src/zoo/fft/test_case/fft_2048_10000.prototxt
Outdated
Show resolved
Hide resolved
PetrelYy
reviewed
Jun 12, 2024
test/mlu_op_gtest/pb_gtest/src/zoo/fft/test_case/fft_2048_6000.prototxt
Outdated
Show resolved
Hide resolved
PetrelYy
reviewed
Jun 12, 2024
PetrelYy
reviewed
Jun 12, 2024
niyuming
reviewed
Jun 13, 2024
PetrelYy
reviewed
Jul 2, 2024
PetrelYy
reviewed
Jul 2, 2024
test/mlu_op_gtest/pb_gtest/src/zoo/fft/test_case/fft_2048_10000.prototxt
Outdated
Show resolved
Hide resolved
PetrelYy
reviewed
Jul 2, 2024
PetrelYy
reviewed
Jul 2, 2024
aokbok
reviewed
Jul 4, 2024
niyuming
reviewed
Jul 8, 2024
niyuming
reviewed
Jul 8, 2024
kernels/fft/fft_optm_device/fft_two-level_network_c2r_device.mlu
Outdated
Show resolved
Hide resolved
kernels/fft/fft_optm_device/fft_two-level_network_r2c_device.mlu
Outdated
Show resolved
Hide resolved
PetrelYy
reviewed
Jul 16, 2024
aokbok
reviewed
Jul 19, 2024
aokbok
reviewed
Jul 22, 2024
AndyQiao0828
approved these changes
Jul 22, 2024
AndyQiao0828
approved these changes
Jul 22, 2024
nike-tinghai
force-pushed
the
master
branch
2 times, most recently
from
July 23, 2024 11:40
3631064
to
958170d
Compare
PetrelYy
approved these changes
Jul 24, 2024
squidruge
force-pushed
the
master
branch
2 times, most recently
from
July 24, 2024 05:23
1b62678
to
103f3b0
Compare
aokbok
approved these changes
Jul 25, 2024
PetrelYy
changed the title
[WIP] Add FFT C2C Feature
[Feature](mluOpExecFFT): FFT operator completion
Jul 25, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Thanks for your contribution and we appreciate it a lot. 🚀🚀
1. Motivation
FFT C2C code review
2. Modification
add two-level network implementation
3. Test Report
If you want to know how to do operator testing, you can see GTest-User-Guide-zh.
3.1 Modification Details
3.1.1 Accuracy Acceptance Standard
3.1.2 Operator Scheme checklist
3.2 Accuracy Test
3.2.1 Accuracy Test
If you have checked the following items, please tick the relevant box.
3.2.2 Parameter Check
Test Point-1:
When a new operator is submitted, the test points are given and the test results are stated
. Acceptance Standard:Normal error
.Test Point-2:
Whether illegal parameters are passed
. Acceptance Standard:Normal error
.3.3 Performance Test
See MLU-OPS™ Performance Acceptance Standard for details.
Platform:MLU370
Note: Google Test filter = fft
[==========] Running 9 test cases from 1 test suite.
[----------] Global test environment set-up.
[2024-6-3 22:51:56] [MLUOP] [Warning]:mluOpInternalGetCommitId not found, use fallback method
[2024-6-3 22:51:56] [MLUOP] [Warning]:mluOpInternalGetBranchInfo not found, use fallback method
[date ]: 2024_06_03_22_51_56
[mluop_version ]: 1.2.0
[mlu_platform ]: MLU370-X4[mtp_372.42]
[job_limit ]:
[cluster_limit ]:
[commit_id ]: commit d0dd5ea
[mluop_branch ]: * master
[driver_version ]: 5.10.10
[cnrt_version ]: 6.10.1
[ip ]: 172.17.0.5
[repeat_count ]: 1
[----------] 9 tests from fft/TestSuite
[ RUN ] fft/TestSuite.mluOp/0
[MLU Hardware Time ]: 7874 (us)
[MLU Interface Time ]: 76.173 (us)
[MLU IO Efficiency ]: 0.00347134
[MLU Compute Efficiency ]: 1.6256e-05
[MLU Workspace Size ]: -1 (Bytes)
[MLU Kernel Name(s) ]: {}
[MLU TheoryOps ]: 131072 (Ops)
[MLU TheoryIOs ]: 8.3968e+06 (Bytes)
[MLU ComputeForce ]: 1.024e+12 (op/s)
[MLU IoBandWidth ]: 307.2 (GB/s)
[GPU Hardware Time ]: -1 (us)
[GPU IO Efficiency ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size ]: -1 (Bytes)
[Diffs]:
[output1]
DIFF1: 5.697111e-06 5.698549e-06
DIFF2: 6.130045e-06 6.131343e-06
[^ OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/fft/test_case/fft_2048_6000.prototxt
[ OK ] fft/TestSuite.mluOp/0 (913 ms)
[ RUN ] fft/TestSuite.mluOp/1
[MLU Hardware Time ]: 10384 (us)
[MLU Interface Time ]: 24.355 (us)
[MLU IO Efficiency ]: 0.00263225
[MLU Compute Efficiency ]: 1.23267e-05
[MLU Workspace Size ]: -1 (Bytes)
[MLU Kernel Name(s) ]: {}
[MLU TheoryOps ]: 131072 (Ops)
[MLU TheoryIOs ]: 8.3968e+06 (Bytes)
[MLU ComputeForce ]: 1.024e+12 (op/s)
[MLU IoBandWidth ]: 307.2 (GB/s)
[GPU Hardware Time ]: -1 (us)
[GPU IO Efficiency ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size ]: -1 (Bytes)
[Diffs]:
[output1]
DIFF1: 5.887155e-06 5.887104e-06
DIFF2: 6.328431e-06 6.328850e-06
[^ OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/fft/test_case/fft_2048_7000.prototxt
[ OK ] fft/TestSuite.mluOp/1 (911 ms)
[ RUN ] fft/TestSuite.mluOp/2
[MLU Hardware Time ]: 11423 (us)
[MLU Interface Time ]: 17.056 (us)
[MLU IO Efficiency ]: 0.00239283
[MLU Compute Efficiency ]: 1.12055e-05
[MLU Workspace Size ]: -1 (Bytes)
[MLU Kernel Name(s) ]: {}
[MLU TheoryOps ]: 131072 (Ops)
[MLU TheoryIOs ]: 8.3968e+06 (Bytes)
[MLU ComputeForce ]: 1.024e+12 (op/s)
[MLU IoBandWidth ]: 307.2 (GB/s)
[GPU Hardware Time ]: -1 (us)
[GPU IO Efficiency ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size ]: -1 (Bytes)
[Diffs]:
[output1]
DIFF1: 5.740946e-06 5.740582e-06
DIFF2: 6.128789e-06 6.127179e-06
[^ OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/fft/test_case/fft_2048_8000.prototxt
[ OK ] fft/TestSuite.mluOp/2 (1102 ms)
[ RUN ] fft/TestSuite.mluOp/3
[MLU Hardware Time ]: 13187 (us)
[MLU Interface Time ]: 16.962 (us)
[MLU IO Efficiency ]: 0.00207275
[MLU Compute Efficiency ]: 9.70653e-06
[MLU Workspace Size ]: -1 (Bytes)
[MLU Kernel Name(s) ]: {}
[MLU TheoryOps ]: 131072 (Ops)
[MLU TheoryIOs ]: 8.3968e+06 (Bytes)
[MLU ComputeForce ]: 1.024e+12 (op/s)
[MLU IoBandWidth ]: 307.2 (GB/s)
[GPU Hardware Time ]: -1 (us)
[GPU IO Efficiency ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size ]: -1 (Bytes)
[Diffs]:
[output1]
DIFF1: 5.947919e-06 5.948924e-06
DIFF2: 6.331446e-06 6.330591e-06
[^ OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/fft/test_case/fft_2048_9000.prototxt
[ OK ] fft/TestSuite.mluOp/3 (1290 ms)
[ RUN ] fft/TestSuite.mluOp/4
[MLU Hardware Time ]: 14689 (us)
[MLU Interface Time ]: 23.507 (us)
[MLU IO Efficiency ]: 0.0018608
[MLU Compute Efficiency ]: 8.714e-06
[MLU Workspace Size ]: -1 (Bytes)
[MLU Kernel Name(s) ]: {}
[MLU TheoryOps ]: 131072 (Ops)
[MLU TheoryIOs ]: 8.3968e+06 (Bytes)
[MLU ComputeForce ]: 1.024e+12 (op/s)
[MLU IoBandWidth ]: 307.2 (GB/s)
[GPU Hardware Time ]: -1 (us)
[GPU IO Efficiency ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size ]: -1 (Bytes)
[Diffs]:
[output1]
DIFF1: 5.799880e-06 5.801629e-06
DIFF2: 6.186158e-06 6.189400e-06
[^ OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/fft/test_case/fft_2048_10000.prototxt
[ OK ] fft/TestSuite.mluOp/4 (1346 ms)
[ RUN ] fft/TestSuite.mluOp/5
[MLU Hardware Time ]: 14911 (us)
[MLU Interface Time ]: 8.378 (us)
[MLU IO Efficiency ]: 0.0018331
[MLU Compute Efficiency ]: 8.58427e-06
[MLU Workspace Size ]: -1 (Bytes)
[MLU Kernel Name(s) ]: {}
[MLU TheoryOps ]: 131072 (Ops)
[MLU TheoryIOs ]: 8.3968e+06 (Bytes)
[MLU ComputeForce ]: 1.024e+12 (op/s)
[MLU IoBandWidth ]: 307.2 (GB/s)
[GPU Hardware Time ]: -1 (us)
[GPU IO Efficiency ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size ]: -1 (Bytes)
[Diffs]:
[output1]
DIFF1: 6.965811e-06 6.966984e-06
DIFF2: 7.470853e-06 7.471314e-06
[^ OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/fft/test_case/fft_2048_11000.prototxt
[ OK ] fft/TestSuite.mluOp/5 (1458 ms)
[ RUN ] fft/TestSuite.mluOp/6
[MLU Hardware Time ]: 15669 (us)
[MLU Interface Time ]: 23.533 (us)
[MLU IO Efficiency ]: 0.00174442
[MLU Compute Efficiency ]: 8.169e-06
[MLU Workspace Size ]: -1 (Bytes)
[MLU Kernel Name(s) ]: {}
[MLU TheoryOps ]: 131072 (Ops)
[MLU TheoryIOs ]: 8.3968e+06 (Bytes)
[MLU ComputeForce ]: 1.024e+12 (op/s)
[MLU IoBandWidth ]: 307.2 (GB/s)
[GPU Hardware Time ]: -1 (us)
[GPU IO Efficiency ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size ]: -1 (Bytes)
[Diffs]:
[output1]
DIFF1: 6.694572e-06 6.691891e-06
DIFF2: 7.161582e-06 7.157658e-06
[^ OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/fft/test_case/fft_2048_12000.prototxt
[ OK ] fft/TestSuite.mluOp/6 (1683 ms)
[ RUN ] fft/TestSuite.mluOp/7
[MLU Hardware Time ]: 20169 (us)
[MLU Interface Time ]: 16.289 (us)
[MLU IO Efficiency ]: 0.00135522
[MLU Compute Efficiency ]: 6.34637e-06
[MLU Workspace Size ]: -1 (Bytes)
[MLU Kernel Name(s) ]: {}
[MLU TheoryOps ]: 131072 (Ops)
[MLU TheoryIOs ]: 8.3968e+06 (Bytes)
[MLU ComputeForce ]: 1.024e+12 (op/s)
[MLU IoBandWidth ]: 307.2 (GB/s)
[GPU Hardware Time ]: -1 (us)
[GPU IO Efficiency ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size ]: -1 (Bytes)
[Diffs]:
[output1]
DIFF1: 6.343064e-06 6.340724e-06
DIFF2: 6.778493e-06 6.776561e-06
[^ OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/fft/test_case/fft_2048_13000.prototxt
[ OK ] fft/TestSuite.mluOp/7 (1720 ms)
[ RUN ] fft/TestSuite.mluOp/8
[MLU Hardware Time ]: 20617 (us)
[MLU Interface Time ]: 158.898 (us)
[MLU IO Efficiency ]: 0.00132577
[MLU Compute Efficiency ]: 6.20847e-06
[MLU Workspace Size ]: -1 (Bytes)
[MLU Kernel Name(s) ]: {}
[MLU TheoryOps ]: 131072 (Ops)
[MLU TheoryIOs ]: 8.3968e+06 (Bytes)
[MLU ComputeForce ]: 1.024e+12 (op/s)
[MLU IoBandWidth ]: 307.2 (GB/s)
[GPU Hardware Time ]: -1 (us)
[GPU IO Efficiency ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size ]: -1 (Bytes)
[Diffs]:
[output1]
DIFF1: 7.087170e-06 7.091904e-06
DIFF2: 7.588693e-06 7.592199e-06
[^ OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/fft/test_case/fft_2048_14000.prototxt
[ OK ] fft/TestSuite.mluOp/8 (1952 ms)
[----------] 9 tests from fft/TestSuite (12375 ms total)
[----------] Global test environment tear-down
[ SUMMARY ] Total 9 cases of 1 op(s).
ALL PASSED.
[==========] 9 test cases from 1 test suite ran. (16498 ms total)
[ PASSED ] 9 test cases.