Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative compression for vector space models #45

Open
iokuznetsov opened this issue Oct 13, 2015 · 1 comment
Open

Alternative compression for vector space models #45

iokuznetsov opened this issue Oct 13, 2015 · 1 comment

Comments

@iokuznetsov
Copy link

Current implementation of VectorBinding supports compression/decompression via GZIP and BZIP2. However, it might be useful to have more advanced compression methods as well, e.g. LZO, LZ4 or Snappy, since I/O and decompression are one of the bottlenecks in similarity-intensive applications. Some benchmarking results can be found here: http://catchchallenger.first-world.info/wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO. At least LZO and Snappy provide Input/OutputStream objects and their Java implementations are on Maven Central, so it should be relatively easy to integrate. However, some knowledge of those libraries is required to get an optimal compression/speed ratio.

@reckart
Copy link
Member

reckart commented Oct 13, 2015

I think that Apache Commons Compress supports most (all?) of the methods you are suggesting and using it here should make it pretty trivial to switch between different algorithms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants