Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rerun random tests with chance of false negative once. #506

Merged
merged 3 commits into from
Dec 21, 2024
Merged

Conversation

christiangnrd
Copy link
Contributor

@christiangnrd christiangnrd commented Dec 20, 2024

As part of our random tests, we verify is that at least one of the values in an array is no longer 0. We test the generation of length 1 arrays of (U)Int8. That gives each run of those tests a 1/256 chance of generating a 0 and erroneously failing our test suite. See the latest such failure.

This PR makes tests that have such a high chance of failing try again before declaring a failure so we don't have to deal with a false failure every few days.

My solution feels overengineered so I'm open to suggestions for a simpler implementation.

@christiangnrd christiangnrd changed the title Rerun tests with chance of false negative once. Rerun random tests with chance of false negative once. Dec 20, 2024
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metal Benchmarks

Benchmark suite Current: 8bd46b6 Previous: ea1d6ad Ratio
private array/construct 26892.85714285714 ns 27217.214285714286 ns 0.99
private array/broadcast 461979.5 ns 464500 ns 0.99
private array/random/randn/Float32 793771 ns 826000 ns 0.96
private array/random/randn!/Float32 672375 ns 658334 ns 1.02
private array/random/rand!/Int64 575084 ns 572854 ns 1.00
private array/random/rand!/Float32 602979 ns 598542 ns 1.01
private array/random/rand/Int64 765166 ns 771187.5 ns 0.99
private array/random/rand/Float32 678188 ns 585687.5 ns 1.16
private array/copyto!/gpu_to_gpu 638167 ns 662042 ns 0.96
private array/copyto!/cpu_to_gpu 815729 ns 683709 ns 1.19
private array/copyto!/gpu_to_cpu 638250 ns 804500 ns 0.79
private array/accumulate/1d 1329542 ns 1317083 ns 1.01
private array/accumulate/2d 1388958.5 ns 1373541 ns 1.01
private array/iteration/findall/int 2064167 ns 2028709 ns 1.02
private array/iteration/findall/bool 1822104 ns 1812688 ns 1.01
private array/iteration/findfirst/int 1705458.5 ns 1707834 ns 1.00
private array/iteration/findfirst/bool 1667542 ns 1660229 ns 1.00
private array/iteration/scalar 3881958.5 ns 3568021 ns 1.09
private array/iteration/logical 3158396 ns 3163792 ns 1.00
private array/iteration/findmin/1d 1759833 ns 1739750 ns 1.01
private array/iteration/findmin/2d 1354000 ns 1349604 ns 1.00
private array/reductions/reduce/1d 1029291.5 ns 1035312.5 ns 0.99
private array/reductions/reduce/2d 664562.5 ns 654666 ns 1.02
private array/reductions/mapreduce/1d 1036125 ns 1034083 ns 1.00
private array/reductions/mapreduce/2d 661437.5 ns 661125 ns 1.00
private array/permutedims/4d 2547729 ns 2484604 ns 1.03
private array/permutedims/2d 1011042 ns 1024083 ns 0.99
private array/permutedims/3d 1583583 ns 1571500 ns 1.01
private array/copy 592833 ns 577000 ns 1.03
latency/precompile 5793242270.5 ns 5769911666.5 ns 1.00
latency/ttfp 6659858875.5 ns 6647448292 ns 1.00
latency/import 1179919708.5 ns 1167766604 ns 1.01
integration/metaldevrt 716292 ns 719229 ns 1.00
integration/byval/slices=1 1631166 ns 1521583.5 ns 1.07
integration/byval/slices=3 10557270.5 ns 9443084 ns 1.12
integration/byval/reference 1551000 ns 1487604 ns 1.04
integration/byval/slices=2 2701125 ns 2653771 ns 1.02
kernel/indexing 483375 ns 531041 ns 0.91
kernel/indexing_checked 478000 ns 472333 ns 1.01
kernel/launch 8250 ns 10201.5 ns 0.81
metal/synchronization/stream 14750 ns 13917 ns 1.06
metal/synchronization/context 15000 ns 14625 ns 1.03
shared array/construct 26222.25 ns 26330.357142857145 ns 1.00
shared array/broadcast 473125 ns 476083 ns 0.99
shared array/random/randn/Float32 849750 ns 768750 ns 1.11
shared array/random/randn!/Float32 662459 ns 657000 ns 1.01
shared array/random/rand!/Int64 570208 ns 554959 ns 1.03
shared array/random/rand!/Float32 610209 ns 599625 ns 1.02
shared array/random/rand/Int64 750750 ns 735208 ns 1.02
shared array/random/rand/Float32 589708 ns 626959 ns 0.94
shared array/copyto!/gpu_to_gpu 88458 ns 87500 ns 1.01
shared array/copyto!/cpu_to_gpu 91333.5 ns 87125 ns 1.05
shared array/copyto!/gpu_to_cpu 78125 ns 82292 ns 0.95
shared array/accumulate/1d 1356958 ns 1329375 ns 1.02
shared array/accumulate/2d 1401750 ns 1383292 ns 1.01
shared array/iteration/findall/int 1805541.5 ns 1790458 ns 1.01
shared array/iteration/findall/bool 1593541.5 ns 1556709 ns 1.02
shared array/iteration/findfirst/int 1399834 ns 1376958 ns 1.02
shared array/iteration/findfirst/bool 1360125 ns 1355917 ns 1.00
shared array/iteration/scalar 156459 ns 152208 ns 1.03
shared array/iteration/logical 2973875 ns 2949792 ns 1.01
shared array/iteration/findmin/1d 1479958 ns 1454250 ns 1.02
shared array/iteration/findmin/2d 1364667 ns 1354125 ns 1.01
shared array/reductions/reduce/1d 723646 ns 728312.5 ns 0.99
shared array/reductions/reduce/2d 670791 ns 662854.5 ns 1.01
shared array/reductions/mapreduce/1d 734708.5 ns 726833.5 ns 1.01
shared array/reductions/mapreduce/2d 672937.5 ns 657667 ns 1.02
shared array/permutedims/4d 2575354 ns 2555792 ns 1.01
shared array/permutedims/2d 1035250 ns 1009333 ns 1.03
shared array/permutedims/3d 1608000 ns 1585666 ns 1.01
shared array/copy 244000 ns 248500 ns 0.98

This comment was automatically generated by workflow using github-action-benchmark.

@maleadt
Copy link
Member

maleadt commented Dec 20, 2024

What about removing the len=1 case and bumping the others from 3 to something higher? There doesn't seem to be much added value since both are testing the 1D case.

@christiangnrd
Copy link
Contributor Author

christiangnrd commented Dec 20, 2024

The tricky part with implementing this was related to total array size which is why I wanted very small array sizes. I increased the relevant ones to 2 so we still hit the sizeof(A) <= 4 test cases. Now the highest odds of an invalid test failure are 1/(256^2) instead of 1/(256).

@maleadt maleadt merged commit e762b01 into main Dec 21, 2024
2 checks passed
@maleadt maleadt deleted the fixrandtest branch December 21, 2024 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants