Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature](mluOpExecFFT): FFT operator completion #1045

Merged
merged 2 commits into from
Jul 25, 2024

Conversation

squidruge
Copy link
Collaborator

@squidruge squidruge commented Jun 3, 2024

Thanks for your contribution and we appreciate it a lot. 🚀🚀

1. Motivation

FFT C2C code review

2. Modification

add two-level network implementation

3. Test Report

If you want to know how to do operator testing, you can see GTest-User-Guide-zh.

3.1 Modification Details

3.1.1 Accuracy Acceptance Standard

  • static threshold
    • diff1
      • [* ] float32 mlu diff1 <= 1e-5
      • float32 mlu diff1 <= 3e-3
      • float16 mlu diff1 <= 3e-3
    • diff2
      • [ *] float32 mlu diff2 <= 1e-5
      • float32 mlu diff2 <= 3e-3
      • float16 mlu diff2 <= 3e-3
    • diff3
      • mlu diff3 == 0
      • mlu diff3_1 == 0
      • mlu diff3_2 == 0
  • dynamic threshold
    • diff1: mlu diff1 <= max(baseline diff1 * 10, static threshold)
    • diff2: mlu diff2 <= max(baseline diff2 * 10, static threshold)
    • diff3: mlu diff3 <= max(baseline diff3 * 10, static threshold)
      • float32, threshold = 1e-5
      • float16, threshold = 1e-3

3.1.2 Operator Scheme checklist

  • Supported hardware
    • [ *] MLU370
    • MLU590
  • Job types
    • BLOCK
    • UNION1
    • UNION2
    • UNION4
    • [ *] The operator will dynamically select the most suitable task type, for example, UNION8

3.2 Accuracy Test

3.2.1 Accuracy Test

If you have checked the following items, please tick the relevant box.

  • Data type test (e.g. float32/int8)
  • Multi-dimensional tensor test
  • Layout test
  • Different size/integer remainder end segment/alignment misalignment test
  • Zero dimensional tensor test/zero element test
  • stability test
  • Multiple platform test
  • Gen_case module test, see: Gencase-User-Guide-zh
  • Nan/INF tests
  • Bug fix tests
  • For memory leak check details, see: GTest-User-Guide-zh
  • For code coverage check details, see: GTest-User-Guide-zh
  • For I/O calculation efficiency check details, see: MLU-OPS™-Performance-Acceptance-Standard

3.2.2 Parameter Check

Test Point-1: When a new operator is submitted, the test points are given and the test results are stated. Acceptance Standard: Normal error.

Please fill your test results(Error Message) in here, ...

Test Point-2: Whether illegal parameters are passed. Acceptance Standard: Normal error.

Test results...

3.3 Performance Test

See MLU-OPS™ Performance Acceptance Standard for details.

Platform:MLU370

Note: Google Test filter = fft
[==========] Running 9 test cases from 1 test suite.
[----------] Global test environment set-up.
[2024-6-3 22:51:56] [MLUOP] [Warning]:mluOpInternalGetCommitId not found, use fallback method
[2024-6-3 22:51:56] [MLUOP] [Warning]:mluOpInternalGetBranchInfo not found, use fallback method
[date ]: 2024_06_03_22_51_56
[mluop_version ]: 1.2.0
[mlu_platform ]: MLU370-X4[mtp_372.42]
[job_limit ]:
[cluster_limit ]:
[commit_id ]: commit d0dd5ea
[mluop_branch ]: * master
[driver_version ]: 5.10.10
[cnrt_version ]: 6.10.1
[ip ]: 172.17.0.5
[repeat_count ]: 1
[----------] 9 tests from fft/TestSuite
[ RUN ] fft/TestSuite.mluOp/0
[MLU Hardware Time ]: 7874 (us)
[MLU Interface Time ]: 76.173 (us)
[MLU IO Efficiency ]: 0.00347134
[MLU Compute Efficiency ]: 1.6256e-05
[MLU Workspace Size ]: -1 (Bytes)
[MLU Kernel Name(s) ]: {}
[MLU TheoryOps ]: 131072 (Ops)
[MLU TheoryIOs ]: 8.3968e+06 (Bytes)
[MLU ComputeForce ]: 1.024e+12 (op/s)
[MLU IoBandWidth ]: 307.2 (GB/s)
[GPU Hardware Time ]: -1 (us)
[GPU IO Efficiency ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size ]: -1 (Bytes)
[Diffs]:
[output1]
DIFF1: 5.697111e-06 5.698549e-06
DIFF2: 6.130045e-06 6.131343e-06
[^ OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/fft/test_case/fft_2048_6000.prototxt
[ OK ] fft/TestSuite.mluOp/0 (913 ms)
[ RUN ] fft/TestSuite.mluOp/1
[MLU Hardware Time ]: 10384 (us)
[MLU Interface Time ]: 24.355 (us)
[MLU IO Efficiency ]: 0.00263225
[MLU Compute Efficiency ]: 1.23267e-05
[MLU Workspace Size ]: -1 (Bytes)
[MLU Kernel Name(s) ]: {}
[MLU TheoryOps ]: 131072 (Ops)
[MLU TheoryIOs ]: 8.3968e+06 (Bytes)
[MLU ComputeForce ]: 1.024e+12 (op/s)
[MLU IoBandWidth ]: 307.2 (GB/s)
[GPU Hardware Time ]: -1 (us)
[GPU IO Efficiency ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size ]: -1 (Bytes)
[Diffs]:
[output1]
DIFF1: 5.887155e-06 5.887104e-06
DIFF2: 6.328431e-06 6.328850e-06
[^ OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/fft/test_case/fft_2048_7000.prototxt
[ OK ] fft/TestSuite.mluOp/1 (911 ms)
[ RUN ] fft/TestSuite.mluOp/2
[MLU Hardware Time ]: 11423 (us)
[MLU Interface Time ]: 17.056 (us)
[MLU IO Efficiency ]: 0.00239283
[MLU Compute Efficiency ]: 1.12055e-05
[MLU Workspace Size ]: -1 (Bytes)
[MLU Kernel Name(s) ]: {}
[MLU TheoryOps ]: 131072 (Ops)
[MLU TheoryIOs ]: 8.3968e+06 (Bytes)
[MLU ComputeForce ]: 1.024e+12 (op/s)
[MLU IoBandWidth ]: 307.2 (GB/s)
[GPU Hardware Time ]: -1 (us)
[GPU IO Efficiency ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size ]: -1 (Bytes)
[Diffs]:
[output1]
DIFF1: 5.740946e-06 5.740582e-06
DIFF2: 6.128789e-06 6.127179e-06
[^ OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/fft/test_case/fft_2048_8000.prototxt
[ OK ] fft/TestSuite.mluOp/2 (1102 ms)
[ RUN ] fft/TestSuite.mluOp/3
[MLU Hardware Time ]: 13187 (us)
[MLU Interface Time ]: 16.962 (us)
[MLU IO Efficiency ]: 0.00207275
[MLU Compute Efficiency ]: 9.70653e-06
[MLU Workspace Size ]: -1 (Bytes)
[MLU Kernel Name(s) ]: {}
[MLU TheoryOps ]: 131072 (Ops)
[MLU TheoryIOs ]: 8.3968e+06 (Bytes)
[MLU ComputeForce ]: 1.024e+12 (op/s)
[MLU IoBandWidth ]: 307.2 (GB/s)
[GPU Hardware Time ]: -1 (us)
[GPU IO Efficiency ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size ]: -1 (Bytes)
[Diffs]:
[output1]
DIFF1: 5.947919e-06 5.948924e-06
DIFF2: 6.331446e-06 6.330591e-06
[^ OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/fft/test_case/fft_2048_9000.prototxt
[ OK ] fft/TestSuite.mluOp/3 (1290 ms)
[ RUN ] fft/TestSuite.mluOp/4
[MLU Hardware Time ]: 14689 (us)
[MLU Interface Time ]: 23.507 (us)
[MLU IO Efficiency ]: 0.0018608
[MLU Compute Efficiency ]: 8.714e-06
[MLU Workspace Size ]: -1 (Bytes)
[MLU Kernel Name(s) ]: {}
[MLU TheoryOps ]: 131072 (Ops)
[MLU TheoryIOs ]: 8.3968e+06 (Bytes)
[MLU ComputeForce ]: 1.024e+12 (op/s)
[MLU IoBandWidth ]: 307.2 (GB/s)
[GPU Hardware Time ]: -1 (us)
[GPU IO Efficiency ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size ]: -1 (Bytes)
[Diffs]:
[output1]
DIFF1: 5.799880e-06 5.801629e-06
DIFF2: 6.186158e-06 6.189400e-06
[^ OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/fft/test_case/fft_2048_10000.prototxt
[ OK ] fft/TestSuite.mluOp/4 (1346 ms)
[ RUN ] fft/TestSuite.mluOp/5
[MLU Hardware Time ]: 14911 (us)
[MLU Interface Time ]: 8.378 (us)
[MLU IO Efficiency ]: 0.0018331
[MLU Compute Efficiency ]: 8.58427e-06
[MLU Workspace Size ]: -1 (Bytes)
[MLU Kernel Name(s) ]: {}
[MLU TheoryOps ]: 131072 (Ops)
[MLU TheoryIOs ]: 8.3968e+06 (Bytes)
[MLU ComputeForce ]: 1.024e+12 (op/s)
[MLU IoBandWidth ]: 307.2 (GB/s)
[GPU Hardware Time ]: -1 (us)
[GPU IO Efficiency ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size ]: -1 (Bytes)
[Diffs]:
[output1]
DIFF1: 6.965811e-06 6.966984e-06
DIFF2: 7.470853e-06 7.471314e-06
[^ OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/fft/test_case/fft_2048_11000.prototxt
[ OK ] fft/TestSuite.mluOp/5 (1458 ms)
[ RUN ] fft/TestSuite.mluOp/6
[MLU Hardware Time ]: 15669 (us)
[MLU Interface Time ]: 23.533 (us)
[MLU IO Efficiency ]: 0.00174442
[MLU Compute Efficiency ]: 8.169e-06
[MLU Workspace Size ]: -1 (Bytes)
[MLU Kernel Name(s) ]: {}
[MLU TheoryOps ]: 131072 (Ops)
[MLU TheoryIOs ]: 8.3968e+06 (Bytes)
[MLU ComputeForce ]: 1.024e+12 (op/s)
[MLU IoBandWidth ]: 307.2 (GB/s)
[GPU Hardware Time ]: -1 (us)
[GPU IO Efficiency ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size ]: -1 (Bytes)
[Diffs]:
[output1]
DIFF1: 6.694572e-06 6.691891e-06
DIFF2: 7.161582e-06 7.157658e-06
[^ OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/fft/test_case/fft_2048_12000.prototxt
[ OK ] fft/TestSuite.mluOp/6 (1683 ms)
[ RUN ] fft/TestSuite.mluOp/7
[MLU Hardware Time ]: 20169 (us)
[MLU Interface Time ]: 16.289 (us)
[MLU IO Efficiency ]: 0.00135522
[MLU Compute Efficiency ]: 6.34637e-06
[MLU Workspace Size ]: -1 (Bytes)
[MLU Kernel Name(s) ]: {}
[MLU TheoryOps ]: 131072 (Ops)
[MLU TheoryIOs ]: 8.3968e+06 (Bytes)
[MLU ComputeForce ]: 1.024e+12 (op/s)
[MLU IoBandWidth ]: 307.2 (GB/s)
[GPU Hardware Time ]: -1 (us)
[GPU IO Efficiency ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size ]: -1 (Bytes)
[Diffs]:
[output1]
DIFF1: 6.343064e-06 6.340724e-06
DIFF2: 6.778493e-06 6.776561e-06
[^ OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/fft/test_case/fft_2048_13000.prototxt
[ OK ] fft/TestSuite.mluOp/7 (1720 ms)
[ RUN ] fft/TestSuite.mluOp/8
[MLU Hardware Time ]: 20617 (us)
[MLU Interface Time ]: 158.898 (us)
[MLU IO Efficiency ]: 0.00132577
[MLU Compute Efficiency ]: 6.20847e-06
[MLU Workspace Size ]: -1 (Bytes)
[MLU Kernel Name(s) ]: {}
[MLU TheoryOps ]: 131072 (Ops)
[MLU TheoryIOs ]: 8.3968e+06 (Bytes)
[MLU ComputeForce ]: 1.024e+12 (op/s)
[MLU IoBandWidth ]: 307.2 (GB/s)
[GPU Hardware Time ]: -1 (us)
[GPU IO Efficiency ]: -1
[GPU Compute Efficiency ]: -1
[GPU Workspace Size ]: -1 (Bytes)
[Diffs]:
[output1]
DIFF1: 7.087170e-06 7.091904e-06
DIFF2: 7.588693e-06 7.592199e-06
[^ OK ] ../../test/mlu_op_gtest/pb_gtest/src/zoo/fft/test_case/fft_2048_14000.prototxt
[ OK ] fft/TestSuite.mluOp/8 (1952 ms)
[----------] 9 tests from fft/TestSuite (12375 ms total)

[----------] Global test environment tear-down
[ SUMMARY ] Total 9 cases of 1 op(s).
ALL PASSED.
[==========] 9 test cases from 1 test suite ran. (16498 ms total)
[ PASSED ] 9 test cases.

@PetrelYy PetrelYy added the Feature Contribute a new feature label Jun 12, 2024
@PetrelYy PetrelYy added this to the v1.3.0 milestone Jun 12, 2024
kernels/fft/fft.h Outdated Show resolved Hide resolved
docs/design_docs/fft/fft2d.md Outdated Show resolved Hide resolved
kernels/fft/c2c_fft/c2c_fft.h Outdated Show resolved Hide resolved
kernels/fft/c2c_fft/c2c_fft.h Show resolved Hide resolved
kernels/fft/fft_optm_device/fft_butterfly_ops.h Outdated Show resolved Hide resolved
kernels/fft/fft_optm_device/fft_c2c_stockham_nram.h Outdated Show resolved Hide resolved
kernels/fft/fft_optm_device/fft_c2c_stockham_gdram.h Outdated Show resolved Hide resolved
kernels/fft/common/fft_basic_ops.cpp Show resolved Hide resolved
kernels/fft/c2c_fft/c2c_fft_host.cpp Show resolved Hide resolved
run_fft.sh Outdated Show resolved Hide resolved
fft_c2c_1d_stride_gen.py Outdated Show resolved Hide resolved
kernels/fft/c2c_fft/c2c_fft.h Outdated Show resolved Hide resolved
kernels/fft/c2c_fft/c2c_fft_host.cpp Outdated Show resolved Hide resolved
kernels/fft/c2c_fft/c2c_fft_host.cpp Outdated Show resolved Hide resolved
kernels/fft/c2c_fft/c2c_fft_host.cpp Show resolved Hide resolved
kernels/fft/c2c_fft/c2c_fft_host.cpp Show resolved Hide resolved
kernels/fft/fft_optm_device/fft_c2r_stockham_nram.h Outdated Show resolved Hide resolved
kernels/fft/fft_optm_device/fft_matmul.mlu Show resolved Hide resolved
mlu_op.h Outdated Show resolved Hide resolved
mlu_op.h Outdated Show resolved Hide resolved
mlu_op.h Outdated Show resolved Hide resolved
mlu_op.h Outdated Show resolved Hide resolved
@AndyQiao0828 AndyQiao0828 self-requested a review July 22, 2024 08:10
@AndyQiao0828 AndyQiao0828 self-requested a review July 22, 2024 08:14
@nike-tinghai nike-tinghai force-pushed the master branch 2 times, most recently from 3631064 to 958170d Compare July 23, 2024 11:40
@squidruge squidruge force-pushed the master branch 2 times, most recently from 1b62678 to 103f3b0 Compare July 24, 2024 05:23
@PetrelYy PetrelYy changed the title [WIP] Add FFT C2C Feature [Feature](mluOpExecFFT): FFT operator completion Jul 25, 2024
@PetrelYy PetrelYy merged commit e34307d into Cambricon:master Jul 25, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Contribute a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants