You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Great notebook! Just wanted to mention that there is no need to pass the generator in the constructor of the EncoderDecoder class. It makes it a bit confusing as looking at the model description in make_model method one implies that the generator is part of the model, yet the loss_compute applies the generator again.
Only after digging into EncoderDecoder definition you realize that the generator is not actually used in the model, so the loss computation is actually correct.
The text was updated successfully, but these errors were encountered:
Good point mkserge. For this implementation of the paper, there is indeed no need to pass the generator in the constructor of the EncoderDecoder class. However, the paper says "In our model, we share the same weight matrix between the two embedding layers and the pre-softmax linear transformation, similar to (cite)". So the input embedding layer of the decoder is actually the transpose of the linear layer of the generator. I guess this implementation has skipped this part. If the implementation really matches the paper, it makes sense to pass the generator in the constructor of the EncoderDecoder class.
Hi,
Great notebook! Just wanted to mention that there is no need to pass the
generator
in the constructor of theEncoderDecoder
class. It makes it a bit confusing as looking at the model description inmake_model
method one implies that the generator is part of the model, yet the loss_compute applies the generator again.Only after digging into EncoderDecoder definition you realize that the generator is not actually used in the model, so the loss computation is actually correct.
The text was updated successfully, but these errors were encountered: