Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No need for a generator in the EncoderDecoder class #105

Open
mkserge opened this issue Dec 31, 2022 · 3 comments
Open

No need for a generator in the EncoderDecoder class #105

mkserge opened this issue Dec 31, 2022 · 3 comments

Comments

@mkserge
Copy link

mkserge commented Dec 31, 2022

Hi,

Great notebook! Just wanted to mention that there is no need to pass the generator in the constructor of the EncoderDecoder class. It makes it a bit confusing as looking at the model description in make_model method one implies that the generator is part of the model, yet the loss_compute applies the generator again.

Only after digging into EncoderDecoder definition you realize that the generator is not actually used in the model, so the loss computation is actually correct.

@zh-jp
Copy link

zh-jp commented Jan 17, 2024

Maybe the forward function in EncoderDecoder should be

    def forward(self, src, tgt, src_mask, tgt_mask):
        memory = self.encode(src, src_mask)
        res_dec = self.decode(memory, src_mask, tgt, tgt_mask)
        return self.generator(res_dec)

@kuraga
Copy link

kuraga commented Sep 14, 2024

But...

out = test_model.decode(
    memory, src_mask, ys, subsequent_mask(ys.size(1)).type_as(src.data)
)
prob = test_model.generator(out[:, -1])

out[:, -1]...

@PangLuo
Copy link

PangLuo commented Dec 8, 2024

Good point mkserge. For this implementation of the paper, there is indeed no need to pass the generator in the constructor of the EncoderDecoder class. However, the paper says "In our model, we share the same weight matrix between the two embedding layers and the pre-softmax linear transformation, similar to (cite)". So the input embedding layer of the decoder is actually the transpose of the linear layer of the generator. I guess this implementation has skipped this part. If the implementation really matches the paper, it makes sense to pass the generator in the constructor of the EncoderDecoder class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants