Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition of Resize operator #510

Closed
Luni-4 opened this issue Jul 20, 2023 · 6 comments
Closed

Addition of Resize operator #510

Luni-4 opened this issue Jul 20, 2023 · 6 comments
Labels

Comments

@Luni-4
Copy link
Collaborator

Luni-4 commented Jul 20, 2023

I have not found any duplicate issue

Feature description

Addition of a Resize operator in burn

Feature motivation

I have tried some pre-trained networks and this operator is nearly present in all of them. So it would be helpful to provide a straightforward method for this, in order to simplify its usage by other developers.

(Optional) Suggest a Solution

Talking with @nathanielsimard, it might be built as a combination of slice_assign and reshape operators, but still not clear whether it is the right way. Another option could be its implementation from scratch.

Here an explanation with examples

@antimora
Copy link
Collaborator

antimora commented Jul 23, 2023

After learning more about Resize operator, I concluded this is not Tensor operator, such as in PyTorch's resize, but rather it's an imager resize operation as described in ONNX Resize op.

So the solution is more specific to ONNX compatibility rather than adding a missing op in Burn. The main question is how do we support this. Since this op is officially in the ONNX spec and there seem to exist models exported to ONNX with this op, we need to support it somehow. There are two possible solutions. And I went back and forth, preferring one over another. I have settle on the solution 2.

Solution 1: Allow user implementation by exposing node hooks.

Here is what it might work:

  1. User registers "Resize" op by providing the full module path. The registration could be like this in build.rs:
use burn_import::onnx::ModelGen;

fn main() {
    // Generate the model code from the ONNX file.
    ModelGen::new()
        .input("src/model/mnist.onnx")
        .out_dir("model/")
        .with_custom("Resize", "mymodule.Resizer")
        .with_custom("STFT", "mymodule.STFTer") // more registration
        .run_from_script();
}

This will generate the following type of code:

// Generated from ONNX "src/model/mnist.onnx" by burn-import
use burn::nn::conv::Conv2d;
use burn::nn::conv::Conv2dConfig;
use burn::record::Recorder;
use burn::{
    module::Module,
    tensor::{backend::Backend, Tensor},
};

use mymodule.Resizer;

#[derive(Module, Debug)]
pub struct Model<B: Backend> {
    conv2d1: Conv2d<B>,
    resizer: Resizer<B>,
}


impl<B: Backend> Model<B> {
    pub fn new_with(record: ModelRecord<B>) -> Self {
        let conv2d1 = Conv2dConfig::new([1, 8], [3, 3])
            .with_stride([1, 1])
            .with_dilation([1, 1])
            .with_groups(1)
            .with_bias(true)
            .init_with(record.conv2d1);

        let resizer = Resizer::from_conf(FULL_ONNX_NODE_INFO); // full ONNX attributes passed 

        Self {
            conv2d1,
            resizer,
        }
    }

    pub fn new() -> Self {
        let conv2d1 = Conv2dConfig::new([1, 8], [3, 3])
            .with_stride([1, 1])
            .with_dilation([1, 1])
            .with_groups(1)
            .with_bias(true)
            .init();
        let resizer = Resizer::from_conf(FULL_ONNX_NODE_INFO); // full ONNX attributes passed 
        Self {
            conv2d1,
            resizer,
        }
    }

    #[allow(clippy::let_and_return)]
    pub fn forward(&self, input1: Tensor<B, 4>) -> Tensor<B, 2> {
        let conv2d1_out1 = self.conv2d1.forward(input1);
        let resizer_out1 = self.resizer.forward(conv2d1_out1);
        resizer_out1
    }
}

Disadvantages of this solution are:

  1. Every user has to reimplement the same custom function
  2. Passing node attribute information to the Resizer constructor of from_conf is kinda messy.

Solution 2: Introduce burn-image crate analogous to Pytorch's torchvision package. This crate will have a collection of image operations. We could for example add image augmentation tools there.

burn-image will contain Resizer module with ResizerConfig. The creation and usage will be analogous to other NN modules except it won't be trainable.

burn-import will implement Resize but will reference structs from burn-image like use burn::image::Resizer;

burn crate will reexport image modules using an image feature flag.

@antimora
Copy link
Collaborator

I have found PyTorch's implementation: https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html

But I recommend we use image library written in pure Rust instead of trying to write alg with tensors. The downside is that it will be using CPU, which is generally fine.

Here is one such library: https://github.com/Cykooz/fast_image_resize

@antimora
Copy link
Collaborator

Here is bicubic alg that's using PyTorch's convolution that ChatGPT has suggested (need to check for accuracy):

import torch
import torch.nn.functional as F

def precompute_bicubic_kernel(dtype, device):
    x = torch.linspace(-1.5, 2.5, 4, dtype=dtype, device=device)
    a = -0.5
    weights = (
        ((a * (x.abs() + 1)**3 - 5*a * x.abs()**2 + 8*a * (x.abs() + 1) - 4*a) * (x.abs() < 1)).unsqueeze(-1) +
        ((a * x.abs()**3 - 5*a * x.abs()**2 + 8*a * x.abs() - 4*a) * (1 <= x.abs()) * (x.abs() < 2)).unsqueeze(-1)
    )
    kernel = weights @ weights.t()
    return kernel

def downsample_bicubic(image, scale_factor):
    # Get the device and dtype of the image
    dtype, device = image.dtype, image.device

    # Precompute the bicubic kernel and make sure it has the right shape, type and device
    kernel = precompute_bicubic_kernel(dtype, device).view(1, 1, 4, 4)

    # Apply padding on both sides of the image. Padding size is derived from the kernel size.
    padded_image = F.pad(image, (1, 2, 1, 2), mode='reflect')

    # Convolve the image with the bicubic kernel
    convolved = F.conv2d(padded_image, kernel, stride=1, padding=0, groups=image.size(1))

    # Now, take every nth pixel, where n is the scale factor
    downscaled = convolved[..., ::scale_factor, ::scale_factor]
    return downscaled

Please note, the weights should remain as non-trainable. If we adopt this alg, we should use functional conv2dwith static/constant tensors not nn::Conv2d.

@antimora
Copy link
Collaborator

antimora commented Jul 25, 2023

@Luni-4 I realized we have a feature request for interpolate function #455, which can be used by Resize.

So, here is my updated plan. Lets add interpolate under nn in the core. Since it appears to be generic and more related to down/up sampling, and used more than just for images.

We could start with bicubic which is needed for your ONNX file.

@Luni-4
Copy link
Collaborator Author

Luni-4 commented Jul 25, 2023

@antimora

Perfect, I completely agree with your plan!

@Luni-4
Copy link
Collaborator Author

Luni-4 commented Oct 9, 2024

Implemented in #1863 and #2081

@Luni-4 Luni-4 closed this as completed Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants