Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: add encoding with compression, additional tags, float support and bigtiff #80

Closed

Conversation

Farkal
Copy link
Member

@Farkal Farkal commented Jul 16, 2020

This is a work in progress and i need some advice about the implementation.

First the lzw lib is unmaintained but there is a fork in image-rs but there is not this pull request merged nwin/lzw#1. To enable the tiff encoding with lzw we need this.

Then I think there is a better way to encode the value to bytes but i didn't find it so i create new TiffWriter :/

        CompressionMethod::LZW => {
                let mut buf = TiffWriter::new(Cursor::new(Vec::new()));
                value.write(&mut buf)?;
                let mut compressed = vec![];
                {
                    let mut lzw_encoder = EncoderTIFF::new(MsbWriter::new(&mut compressed), 8)?;
                    lzw_encoder.encode_bytes(&buf.writer.into_inner()[..])?;
                }
                self.encoder.write_data(&compressed[..])?
        },

And the big issue is with the additional_tags i can't find a way to allow to pass an array of object implementing the trait TiffValue with Vec<(Tag, Box<dyn TiffValue>)>, the compiler complain:

the trait `encoder::TiffValue` cannot be made into an object
consider moving `write` to another trait
consider moving `BYTE_LEN` to another trait
consider moving `FIELD_TYPE` to another trait rustc(E0038)
mod.rs(38, 8): ...because method `write` has generic type parameters
mod.rs(32, 11): ...because it contains this associated `const`
mod.rs(33, 11): ...because it contains this associated `const`
mod.rs(31, 11): this trait cannot be made into an object...

So currently you can only use one type of TiffValue and need to specify it when you write the image or create the encoder like the colortype 😢

@HeroicKatora
Copy link
Member

We have a fork here: https://github.com/image-rs/lzw :)

@Farkal
Copy link
Member Author

Farkal commented Jul 16, 2020

Yes but the merge request for tiff encoding is not in this fork. You want me to create a pull request ? Also where this fork will be published on cargo ? We will need a new name no ?
Do you have any advice for the array of TiffValue trait ? Should we have a trait for encoding and another for decoding ? :/

@HeroicKatora
Copy link
Member

HeroicKatora commented Jul 16, 2020

No, please don't move the PR outright. The encoder of lzw is anywhere between slow (literally ×100 to gzip) and outright bad. It will need an overhaul and where we should think of doing tiff-style from the start. The rewrite was currently focussed on the decoder since we have significant gains there as well (roughly ×2-3 throughput in master) but the interface is not yet finalized. The crate will be published under the name weezl.

@Farkal
Copy link
Member Author

Farkal commented Jul 16, 2020

Ok i don't touch the PR 😄
For the TIffValue i think we should change the trait to enable the behavior needed, like using function to define the byte_len and field_type. But I have no idea for the writer generic parameter 😞

Use struct for Colortype
Add support for f32 and f64
Allow to write multiples tags
@Farkal
Copy link
Member Author

Farkal commented Jul 22, 2020

Lot of changes to be able to encode any tags. I rewrite all the TiffValues in an enum and use a struct for the ColorTypes (why use a trait to store vars ?).
I added the support for floating values (f32 and f64). This support is not well tested and incomplete.
If you have any advice 😉
I will implement the support for big tiff here because there is too much changes and the merging will be hard if i create it in another branch.

@Farkal Farkal changed the title WIP: add encoding with compression and additional tags WIP: add encoding with compression, additional tags, float support and bigtiff Jul 22, 2020
@HeroicKatora
Copy link
Member

HeroicKatora commented Jul 22, 2020

I will implement the support for big tiff here because there is too much changes and the merging will be hard if i create it in another branch.

I would rather evaluate if the approach is correct first instead of risking that big tiff will require large changes 🙂 You've observed that Box<dyn TiffValue> doesn't work but probably there are alternatives that require less changes to the trait and the encoder interface. For example, what if additional_tags is an FnOnce callback with which the caller can write the additional tags? It would also be good if new parameters weren't added to existing methods but wherever possible moving the implementation to a new method with the parameters, changing the old methods to provide the defaults for these parameters. This avoids unecessary breakage.

@Farkal
Copy link
Member Author

Farkal commented Jul 22, 2020

😢 Yeah you are right a simple function would have been easier 😞
But i still think that Colortype musn't be a Trait.
For the new method i will create a new one with the additional parameters 😉

For the bigtiff in fact it's pretty simple, if you implement the bigtiff you have implemented the tiff, it's juste the size of the ifd that change from u32 to u64 (and some header parameters) my scheme here #47 (comment) help to see witch part are different.
And as this page say :

When a file is written it is initially unknown whether it will have to be a BigTIFF (if the total size is less than 2^32, there is no reason to make it a BigTIFF file). The library begins by writing a standard TIFF file with 32-bit offsets. It leaves 8 extra bytes of space after the file header in case the file will need to become a BigTIFF file, but otherwise the processing is identical to before. Internally the library will use 64-bit offsets but externally the file will contain 32-bit offsets.

So we only need to use u64 for the offsets and we have the compatibility to BigTiff. I think i will be able to do it in separate branch. My main issue is with the value of the tag decoded that are now all on 8 bytes. So i read put everything in a u64 and it's only when i call the read method that i choose to read_long or read_long8 if the decoder have detected big tiff or not. So i have to rewrite all the impl Value of the decoder/ifd.rs with into_u64 into_i64 into_u64_vec... Does this seems ok for you ?

@Farkal
Copy link
Member Author

Farkal commented Jul 24, 2020

I am splitting this PR in multiple smaller PR but i need to wait for the PR #84 to be merged before adding the support of lzw compression (it will simply add parameter to the extended function introduced by the PR).
For the float support do you think going from the PR #70 would be great idea ? (seems there is no more activity) or should i start a new one ?

@aschampion
Copy link
Contributor

I haven't abandoned #70; I just have very little time for OSS projects until September. Any implementation of floating point is going to look at least in part identical to #70, so you're welcome to use it as a starting point for a new PR if you want to wrap up the last necessary part, the predictor.

@Farkal
Copy link
Member Author

Farkal commented Jul 24, 2020

Ok i will do my best to implement the missing part ;)

@HeroicKatora
Copy link
Member

@Farkal Now that float support and BitTiff have landed as part of other PRs, do you want to rebase the remaining parts? The lzw compression support would need to be rewritten, though, but should be much faster than with the lzw crate.

@Farkal
Copy link
Member Author

Farkal commented Sep 7, 2020

So you think i should rewrite the compression without using the lzw crate ? You didn't think that we should improve the crate and use it here ?

@HeroicKatora
Copy link
Member

The lzw crate made some fundamental decisions that make it hard to rewrite and it is unmaintained. I don't think it's worth the investment of time given that weezl already has a very similar feature set and is much faster already in particular for encoding (literal ×20 or more).

@Farkal
Copy link
Member Author

Farkal commented Sep 7, 2020

So I think we should just use the weezl crate no ?

@HeroicKatora
Copy link
Member

Yes.

@Farkal
Copy link
Member Author

Farkal commented Apr 12, 2021

As the other PR have been merged this one doesn't need to stay open 😉

@Farkal Farkal closed this Apr 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants