-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Implement CopyNext model #5025
Comments
I'd be very interested in using this model for my own project. I think I could probably take this on. Do you have a rough idea on where the modifications would go and what they would look like? In the meantime I'll dig into the existing code and CopyNet/CopyNext. |
That would be great @JohnGiorgi! So, I think you want to start by adding a parameter to the Now I could be wrong, but I think the only place in the model implementation that would need to be updated is the |
Okay I think I get the basic idea, Introduce a self._copy_next_index = self.vocab.add_token_to_namespace(copy_next_token, self._target_namespace) Modify for index in indices:
# The model predicted the CopyNext token (CN).
# Take the token to be the next adjacent token in source_tokens
if index == self._copy_next_index:
# TODO: Get the position of the previously copied token in source_tokens
prev_copy_source_index = ...?
# TODO: Take token to be the immediate next token in the source of the previously copied one
# call it "copy_next_source_index" for now.
copy_next_source_index = prev_copy_source_index + 1
token = metadata["source_tokens"][copy_next_source_index]
elif index >= self._target_vocab_size:
adjusted_index = index - self._target_vocab_size
token = metadata["source_tokens"][adjusted_index]
else:
token = self.vocab.get_token_from_index(index, self._target_namespace)
tokens.append(token) A couple of questions...
One solution I can think of is masking out |
Yeup!
Exactly. For your questions:
|
Got it, thanks for the guidance. I think next steps are clear, I will work on this and open a PR to |
Actually, I am thinking this might be a good bit more complicated than just updating In CopyNext, the model assigns a score to the copy next operation by passing the decoder's current hidden state through a linear layer. If we wanted to match this, we would need to: Add a new linear layer at model initialization: self._output_copy_next_layer = Linear(self.decoder_output_dim, 1) Add a method to compute the copy next score: def _get_copy_next_scores(self, state: Dict[str, torch.Tensor]) -> torch.Tensor:
return self._output_copy_next_layer(state["decoder_hidden"]) AFAICT the copy next score is based solely on a linear transformation of the decoder hidden state. This is what I gather from the figure: and the text: I think this significantly complicates things, downstream, particularly anywhere Does this sound right? |
Ah, yes you're right. Their implementation actually seems a lot simpler than CopyNet. And I think this is because they don't worry about the case where tokens in the source sequence may be in the target vocabulary ( So I think we have a couple options:
What do you think? |
What if I go for 1. right now? I think I could do this inside a week. I could test it on my own task, joint NER and RE, that I currently use CopyNet for. I can see if there's a performance boost. I think I would still be interested in trying out CopyNext, so I could attempt to implement it from scratch as well. This might take me a little longer though. The authors list a GitHub in the paper (https://github.com/abhinonymous/copynext), but it gives me a 404. I will reach out and see if there's a public implementation somewhere. |
Sounds good @JohnGiorgi! |
@JohnGiorgi / @epwalsh I'm interested in CopyNext as well, I may be able to assist with 2 once I do a bit more research into what's involved. |
https://api.semanticscholar.org/CorpusID:225103260
This is similar to
CopyNet
from Gu et al., 2016, but adds an inductive bias for copying contiguous tokens.The implementation is actually just a simple extension of-- As @JohnGiorgi pointed out below, CopyNext uses a separate linear layer to calculate the "CN" score. And there are some other differences with this model as well. Most notably, they make a big simplification by treating all tokens in the source sequence as unique, and not worrying about the case where a source token may be part of the target vocabulary. This makes sense for the task that CopyNext was applied to (nested NER), but not general seq2seq tasks.CopyNet
:CopyNext
introduces a special symbol in the target vocabulary - the "CopyNext Symbol" (CN) - which corresponds to the operation of copying another token. So the CN token always follows a copied token that is the first in its span or another CN token.The text was updated successfully, but these errors were encountered: