-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use divide and conquer in to_radix_digits #316
base: master
Are you sure you want to change the base?
Conversation
src/biguint/convert.rs
Outdated
@@ -701,34 +701,48 @@ pub(super) fn to_radix_digits_le(u: &BigUint, radix: u32) -> Vec<u8> { | |||
// The threshold for this was chosen by anecdotal performance measurements to | |||
// approximate where this starts to make a noticeable difference. | |||
if digits.data.len() >= 64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you re-evaluate this threshold at all? Notably, it's different than the one you used in to_radix_digits_le_divide_and_conquer
. Maybe that does make sense since the inner part doesn't have to pay for creating big_bases
, but I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are new results relevant for the threshold:
simple:
test 1009 bit ... bench: 4,169.26 ns/iter (+/- 470.97)
test 2009 bit ... bench: 14,735.97 ns/iter (+/- 1,819.63)
test 3009 bit ... bench: 32,522.20 ns/iter (+/- 2,949.82)
test 4009 bit ... bench: 56,441.64 ns/iter (+/- 6,354.65)
divide and conquer:
test 1009 bit ... bench: 5,955.14 ns/iter (+/- 859.07)
test 2009 bit ... bench: 12,731.82 ns/iter (+/- 1,780.59)
test 3009 bit ... bench: 18,701.03 ns/iter (+/- 2,284.40)
test 4009 bit ... bench: 27,605.41 ns/iter (+/- 5,229.87)
So probably 2000/64 ~ 32 make sense as new threshold?
since the inner part doesn't have to pay for creating big_bases, but I'm not sure
If I understand correctly, the main difference in small numbers is that the recursive algorithm loses const propagation for 10. If it wasn't the case, I'd expect some threshold near 8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly did you change for your new results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In benchmarks? I changed them to this:
#[bench]
fn to_str_radix_10(b: &mut Bencher) {
to_str_radix_bench(b, 10, 1009);
}
#[bench]
fn to_str_radix_10_2(b: &mut Bencher) {
to_str_radix_bench(b, 10, 2009);
}
#[bench]
fn to_str_radix_10_3(b: &mut Bencher) {
to_str_radix_bench(b, 10, 3009);
}
#[bench]
fn to_str_radix_10_4(b: &mut Bencher) {
to_str_radix_bench(b, 10, 4009);
}
And I changed if digits.data.len() >= 64 {
to if digits.data.len() >= 1 {
and if digits.data.len() >= 1000 {
.
I would like to implement Burnikel Ziegler for fast division to make to_radix even faster. Would you rather have it in this PR or merge this alone? |
This implements the algorithm mentioned in #315
Benchmark:
Currently both grow with
O(n^2)
, to make things algorithmically faster we need a faster multiplication and division algorithm.