No need for a generator in the EncoderDecoder class #105

mkserge · 2022-12-31T23:03:30Z

Hi,

Great notebook! Just wanted to mention that there is no need to pass the generator in the constructor of the EncoderDecoder class. It makes it a bit confusing as looking at the model description in make_model method one implies that the generator is part of the model, yet the loss_compute applies the generator again.

Only after digging into EncoderDecoder definition you realize that the generator is not actually used in the model, so the loss computation is actually correct.

The text was updated successfully, but these errors were encountered:

zh-jp · 2024-01-17T09:08:17Z

Maybe the forward function in EncoderDecoder should be

    def forward(self, src, tgt, src_mask, tgt_mask):
        memory = self.encode(src, src_mask)
        res_dec = self.decode(memory, src_mask, tgt, tgt_mask)
        return self.generator(res_dec)

kuraga · 2024-09-14T19:16:44Z

But...

out = test_model.decode(
    memory, src_mask, ys, subsequent_mask(ys.size(1)).type_as(src.data)
)
prob = test_model.generator(out[:, -1])

out[:, -1]...

PangLuo · 2024-12-08T22:23:53Z

Good point mkserge. For this implementation of the paper, there is indeed no need to pass the generator in the constructor of the EncoderDecoder class. However, the paper says "In our model, we share the same weight matrix between the two embedding layers and the pre-softmax linear transformation, similar to (cite)". So the input embedding layer of the decoder is actually the transpose of the linear layer of the generator. I guess this implementation has skipped this part. If the implementation really matches the paper, it makes sense to pass the generator in the constructor of the EncoderDecoder class.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No need for a generator in the EncoderDecoder class #105

No need for a generator in the EncoderDecoder class #105

mkserge commented Dec 31, 2022

zh-jp commented Jan 17, 2024

kuraga commented Sep 14, 2024

PangLuo commented Dec 8, 2024

No need for a generator in the EncoderDecoder class #105

No need for a generator in the EncoderDecoder class #105

Comments

mkserge commented Dec 31, 2022

zh-jp commented Jan 17, 2024

kuraga commented Sep 14, 2024

PangLuo commented Dec 8, 2024