Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removed lookback_goodness_argmax #258

Merged
merged 6 commits into from
Dec 18, 2024
Merged

Removed lookback_goodness_argmax #258

merged 6 commits into from
Dec 18, 2024

Conversation

Skielex
Copy link
Collaborator

@Skielex Skielex commented Dec 17, 2024

Simplifies code and benchmarks indicate this change will improve performance.

cargo run --release -- bench -i data/binary/ --input-format binary -d interl --iters 100

Laptop (Intel® Core™ i7-1370P Processor):
main:
╭─────────────────────┬───────┬─────────────┬───────────────┬─────────────────╮      
│ dataset             │ codec │ compress_dt │ decompress_dt │ compressed_size │      
├─────────────────────┼───────┼─────────────┼───────────────┼─────────────────┤      
│ i32_interl0         │ pco   │  58.78775ms │     4.01065ms │          890794 │      
│ i64_interl0         │ pco   │  62.42405ms │      4.7225ms │          891168 │      
│ i32_interl1         │ pco   │   59.0238ms │      3.9483ms │          541198 │      
│ i64_interl1         │ pco   │  64.43565ms │     4.83275ms │          541270 │      
│ i32_interl_scrambl1 │ pco   │  64.15285ms │      4.1004ms │         1106568 │      
│ i64_interl_scrambl1 │ pco   │  69.67635ms │      5.1856ms │         1106654 │      
│ <sum>               │ pco   │ 378.50045ms │     26.8002ms │         5077652 │      
│ <sum>               │ <sum> │ 378.50045ms │     26.8002ms │         5077652 │      
╰─────────────────────┴───────┴─────────────┴───────────────┴─────────────────╯
no-argmax:
╭─────────────────────┬───────┬─────────────┬───────────────┬─────────────────╮      
│ dataset             │ codec │ compress_dt │ decompress_dt │ compressed_size │      
├─────────────────────┼───────┼─────────────┼───────────────┼─────────────────┤      
│ i32_interl0         │ pco   │   58.1858ms │     3.96095ms │          890794 │      
│ i64_interl0         │ pco   │   62.4123ms │     4.72745ms │          891168 │      
│ i32_interl1         │ pco   │   56.7742ms │     3.86515ms │          541198 │      
│ i64_interl1         │ pco   │    62.386ms │      4.7476ms │          541270 │      
│ i32_interl_scrambl1 │ pco   │  63.87445ms │      4.0084ms │         1106568 │      
│ i64_interl_scrambl1 │ pco   │    67.999ms │      5.0584ms │         1106654 │      
│ <sum>               │ pco   │ 371.63175ms │    26.36795ms │         5077652 │      
│ <sum>               │ <sum> │ 371.63175ms │    26.36795ms │         5077652 │      
╰─────────────────────┴───────┴─────────────┴───────────────┴─────────────────╯ 
vectorized-lookback-goodness:
╭─────────────────────┬───────┬─────────────┬───────────────┬─────────────────╮      
│ dataset             │ codec │ compress_dt │ decompress_dt │ compressed_size │
├─────────────────────┼───────┼─────────────┼───────────────┼─────────────────┤
│ i32_interl0         │ pco   │  60.84105ms │     3.95345ms │          890794 │
│ i64_interl0         │ pco   │  67.02285ms │     4.90655ms │          891168 │      
│ i32_interl1         │ pco   │   66.9435ms │       4.099ms │          541198 │      
│ i64_interl1         │ pco   │   70.1553ms │     5.12095ms │          541270 │      
│ i32_interl_scrambl1 │ pco   │  80.34415ms │      4.2488ms │         1106568 │      
│ i64_interl_scrambl1 │ pco   │  69.42205ms │     5.10795ms │         1106654 │      
│ <sum>               │ pco   │  414.7289ms │     27.4367ms │         5077652 │      
│ <sum>               │ <sum> │  414.7289ms │     27.4367ms │         5077652 │      
╰─────────────────────┴───────┴─────────────┴───────────────┴─────────────────╯    

Intel(R) Xeon(R) Gold 6226 CPU @ 2.70GHz:
main:
╭─────────────────────┬───────┬──────────────┬───────────────┬─────────────────╮
│ dataset             │ codec │  compress_dt │ decompress_dt │ compressed_size │
├─────────────────────┼───────┼──────────────┼───────────────┼─────────────────┤
│ i32_interl0         │ pco   │  92.537413ms │    6.549404ms │          890794 │
│ i64_interl0         │ pco   │  88.148776ms │    6.630989ms │          891168 │
│ i32_interl1         │ pco   │  82.484466ms │    6.015548ms │          541198 │
│ i64_interl1         │ pco   │  88.400502ms │    6.628358ms │          541270 │
│ i32_interl_scrambl1 │ pco   │  89.182903ms │    6.616342ms │         1106568 │
│ i64_interl_scrambl1 │ pco   │  95.131184ms │    7.270969ms │         1106654 │
│ <sum>               │ pco   │ 535.885244ms │    39.71161ms │         5077652 │
│ <sum>               │ <sum> │ 535.885244ms │    39.71161ms │         5077652 │
╰─────────────────────┴───────┴──────────────┴───────────────┴─────────────────╯
no-argmax:
╭─────────────────────┬───────┬──────────────┬───────────────┬─────────────────╮
│ dataset             │ codec │  compress_dt │ decompress_dt │ compressed_size │
├─────────────────────┼───────┼──────────────┼───────────────┼─────────────────┤
│ i32_interl0         │ pco   │  96.517644ms │     7.18903ms │          890794 │
│ i64_interl0         │ pco   │  82.453743ms │    7.097635ms │          891168 │
│ i32_interl1         │ pco   │  87.260784ms │    6.602727ms │          541198 │
│ i64_interl1         │ pco   │  81.930468ms │    7.055602ms │          541270 │
│ i32_interl_scrambl1 │ pco   │  94.036032ms │    7.155916ms │         1106568 │
│ i64_interl_scrambl1 │ pco   │  87.105378ms │    7.852009ms │         1106654 │
│ <sum>               │ pco   │ 529.304049ms │   42.952919ms │         5077652 │
│ <sum>               │ <sum> │ 529.304049ms │   42.952919ms │         5077652 │
╰─────────────────────┴───────┴──────────────┴───────────────┴─────────────────╯
vectorized-lookback-goodness:
╭─────────────────────┬───────┬──────────────┬───────────────┬─────────────────╮
│ dataset             │ codec │  compress_dt │ decompress_dt │ compressed_size │
├─────────────────────┼───────┼──────────────┼───────────────┼─────────────────┤
│ i32_interl0         │ pco   │  94.065806ms │    6.551866ms │          890794 │
│ i64_interl0         │ pco   │  90.745982ms │    6.652709ms │          891168 │
│ i32_interl1         │ pco   │  89.614793ms │     5.97645ms │          541198 │
│ i64_interl1         │ pco   │   94.45275ms │    6.601613ms │          541270 │
│ i32_interl_scrambl1 │ pco   │ 122.174215ms │    6.606691ms │         1106568 │
│ i64_interl_scrambl1 │ pco   │ 124.820737ms │    7.250678ms │         1106654 │
│ <sum>               │ pco   │ 615.874283ms │   39.640007ms │         5077652 │
│ <sum>               │ <sum> │ 615.874283ms │   39.640007ms │         5077652 │
╰─────────────────────┴───────┴──────────────┴───────────────┴─────────────────╯

pco/src/delta.rs Outdated
@@ -125,37 +125,34 @@ fn lookback_hash_lookup(
}
}

#[inline(never)]
fn lookback_compute_goodness<L: Latent>(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should change the name of this function now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pco/src/delta.rs Outdated
) {
) -> usize {
let mut best_goodness = 0;
let mut best_lookback: usize = 0;
for lookback_idx in 0..PROPOSED_LOOKBACKS {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we just index into the proposed lookbacks with this? I believe if you run cargo clippy it will give you a lint

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@mwlon mwlon merged commit 59de262 into mwlon:main Dec 18, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants