Skip to content

Releases: mwlon/pcodec

v0.1.1

02 Dec 20:33
4ca9841
Compare
Choose a tag to compare
  • Improved standalone decompression speed ~5% by storing a size hint for the count of numbers in the whole file.
  • Due to the above, was able to reduce default chunk size at no performance cost, improving compression ratio.
  • Improved compression speed ~15% with optimized writer logic.
  • Substantially increased compression and decompression speed in special cases when steps can be skipped.

v0.1.0

29 Nov 02:49
2b823d2
Compare
Choose a tag to compare
  • Breaking changes
    • format: replaced GCD mode with int mult mode. This simplifies the format (is very similar to float mult mode) and is more robust in the ways we care about. However, GCD-encoded data from v0.0.0 will no longer be decompressible. This could have been made as a backward-compatible change, but since v0.0.0 has reasonably few downloads and GCD data is rare, I decided it was better to break it rather than keep dead old code around forever. Int mult gets 11% better compression ratio on the total_cents bench dataset than GCD did.
    • API: Removed GCD-related metadata such as Bin::gcd and replaced configurations with int mult equivalents.
    • API: Renamed Progress.finished_page to Progress.finished since it sometimes refers to different units.
  • Improved decompression speed with SIMD offset reads.
  • Added standalone::simple_decompress_into.
  • Fixed a rare bug in compression that caused it to became lossy on nearly-linear sequences of floats with floating point errors.

v0.0.0

15 Nov 13:01
Compare
Choose a tag to compare
  • Improved decompression performance ~70% on aarch64, ~10% on x86_64.
  • Supported consuming any BetterBufRead implementation during decompression, rather than only &[u8]
  • Changed the API for wrapped::PageDecompressor and standalone::ChunkDecompressor to own src, since these parts of the file need to be read in order and contiguously.
  • Updated docs, including real-world benchmarks on air quality, taxi, and r/place datasets.

v0.0.0-alpha.3

04 Nov 18:54
Compare
Choose a tag to compare

With lower-level unit testing, found and fixed 3 serious bugs:

  • encoding more than one page per chunk failed; it tried to encode the whole chunk every time
  • decoding one batch at a time failed because the code path asserted the reader would be byte aligned
  • decoding with most limits through the CLI failed because it create a bad count of numbers for pco

v0.0.0-alpha.2

29 Oct 03:39
Compare
Choose a tag to compare
  • Revamped the API into separate structs for File, Chunk, and Page compressors/decompressors.
  • Fixed a known bug that caused panics for 32-bit architectures.
  • Made decompression almost-zero-copy, increasing performance slightly.
  • Made standalone actually just a minimal wrapped format with no access to private functionality.

v0.0.0-alpha.1

04 Sep 15:43
Compare
Choose a tag to compare

Changed the format to contain tiny batches (256 numbers each) with contiguous 4-way interleaved tANS codes and contiguous offsets. This increased the buffer space needed, but allowed decent CPU utilization during tANS decoding and excellent SIMD utilization during offset decoding, approximately a 30% decompression speedup overall.

v0.0.0-alpha.0

07 Jul 23:47
Compare
Choose a tag to compare

Unveiling the alpha of pco (the library) and pco_cli (the binary) for pcodec, a new format and codec for compressing numerical sequences. Its API is very similar to that of q_compress, but its compression ratios and decompression speeds are much better.