Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Implement CopyNext model #5025

Open
epwalsh opened this issue Feb 26, 2021 · 10 comments
Open

Implement CopyNext model #5025

epwalsh opened this issue Feb 26, 2021 · 10 comments
Labels
Contributions welcome hard Difficult tasks Models Issues related to the allennlp-models repo

Comments

@epwalsh
Copy link
Member

epwalsh commented Feb 26, 2021

https://api.semanticscholar.org/CorpusID:225103260

This is similar to CopyNet from Gu et al., 2016, but adds an inductive bias for copying contiguous tokens.

The implementation is actually just a simple extension of CopyNet: CopyNext introduces a special symbol in the target vocabulary - the "CopyNext Symbol" (CN) - which corresponds to the operation of copying another token. So the CN token always follows a copied token that is the first in its span or another CN token. -- As @JohnGiorgi pointed out below, CopyNext uses a separate linear layer to calculate the "CN" score. And there are some other differences with this model as well. Most notably, they make a big simplification by treating all tokens in the source sequence as unique, and not worrying about the case where a source token may be part of the target vocabulary. This makes sense for the task that CopyNext was applied to (nested NER), but not general seq2seq tasks.

@epwalsh epwalsh added Contributions welcome Models Issues related to the allennlp-models repo hard Difficult tasks GSoC labels Feb 26, 2021
@dirkgr dirkgr removed the GSoC label Mar 9, 2021
@JohnGiorgi
Copy link
Contributor

I'd be very interested in using this model for my own project. I think I could probably take this on. Do you have a rough idea on where the modifications would go and what they would look like?

In the meantime I'll dig into the existing code and CopyNet/CopyNext.

@epwalsh
Copy link
Member Author

epwalsh commented Mar 15, 2021

That would be great @JohnGiorgi!

So, I think you want to start by adding a parameter to the CopyNetDatasetReader and CopyNet model classes for the "CN" token: copy_next_token: Optional[str] = None (or something like that). By default this is None, meaning the model is just regular CopyNet.

Now I could be wrong, but I think the only place in the model implementation that would need to be updated is the _get_predicted_tokens method: https://github.com/allenai/allennlp-models/blob/main/allennlp_models/generation/models/copynet_seq2seq.py#L853, so maybe this isn't as hard as I thought. I don't think the loss calculation needs to change in any way. But you may want to double check the "CopyNext" paper just to make sure. Let me know if you have any other questions!

@JohnGiorgi
Copy link
Contributor

JohnGiorgi commented Mar 15, 2021

Okay I think I get the basic idea,

Introduce a copy_next_token: Optional[str] = None as an additional argument to CopyNet. I believe this token would need to be added to self._target_namespace, because its something the model generates (right?). So in __init__, call

self._copy_next_index = self.vocab.add_token_to_namespace(copy_next_token, self._target_namespace)

Modify _get_predicted_tokens so that when self._copy_next_index is predicted, we take the predicted token to be the immediate next token in the source as was previously copied. The relevant part of _get_predicted_tokens would look something like:

for index in indices:
    # The model predicted the CopyNext token (CN).
    # Take the token to be the next adjacent token in source_tokens
    if index == self._copy_next_index:
        # TODO: Get the position of the previously copied token in source_tokens
        prev_copy_source_index = ...?
        # TODO: Take token to be the immediate next token in the source of the previously copied one
        # call it "copy_next_source_index" for now.
        copy_next_source_index = prev_copy_source_index + 1
        token = metadata["source_tokens"][copy_next_source_index]
    elif index >= self._target_vocab_size:
        adjusted_index = index - self._target_vocab_size
        token = metadata["source_tokens"][adjusted_index]
    else:
        token = self.vocab.get_token_from_index(index, self._target_namespace)
    tokens.append(token)

A couple of questions...

  1. _get_predicted_tokens currently only accumulates the tokens (on this line). I think for this to work, we need to also accumulate the index in source_tokens that each copy operation chose. I am stuck on how to do this.
  2. Three edges cases come to mind where the model predicts copy_next_token, but the copy next operation doesn't make sense:
    1. when the model's very first prediction is copy_next_token.
    2. when the model predicts copy_next_token right after generating something from the target vocabulary.
    3. when the model copies the very last token in source_tokens, and then generates copy_next_token.

One solution I can think of is masking out self._copy_next_index in the decoder's softmax in each of these situations, so it cannot erroneously choose the copy next token. Does this sound reasonable? AFAICT that is what CopyNext is doing (see Figure 3 of their paper).

@epwalsh
Copy link
Member Author

epwalsh commented Mar 15, 2021

I believe this token would need to be added to self._target_namespace (right?)

Yeup!

Modify _get_predicted_tokens so that when self._copy_next_index is predicted, we take the predicted token to be the immediate next token in the source as was previously copied

Exactly.

For your questions:

  1. For a copied token, adjusted_index is the index of the token with respect to the source sequence: https://github.com/allenai/allennlp-models/blob/3624fdc4b249d3b515269ee99e1b1e15bf876829/allennlp_models/generation/models/copynet_seq2seq.py#L877. So then you could just increment that when the next token is a copy_next_token (would have to keep track of whether or not you are in one of your edge cases).
  2. In practice I don't think any of these would be much of an issue. During training the model will quickly learn that the copy_next_token only follows the copy_token or another copy_next_token. Masking it out certainly makes sense, but that could be tricky to implement efficiently.

@JohnGiorgi
Copy link
Contributor

Got it, thanks for the guidance. I think next steps are clear, I will work on this and open a PR to allennlp-models soon!

@JohnGiorgi
Copy link
Contributor

Actually, I am thinking this might be a good bit more complicated than just updating _get_predicted_tokens.

In CopyNext, the model assigns a score to the copy next operation by passing the decoder's current hidden state through a linear layer. If we wanted to match this, we would need to:

Add a new linear layer at model initialization:

self._output_copy_next_layer  = Linear(self.decoder_output_dim, 1)

Add a method to compute the copy next score:

def _get_copy_next_scores(self, state: Dict[str, torch.Tensor]) -> torch.Tensor:
    return self._output_copy_next_layer(state["decoder_hidden"])

AFAICT the copy next score is based solely on a linear transformation of the decoder hidden state. This is what I gather from the figure:

image

and the text:

image

I think this significantly complicates things, downstream, particularly anywhere all_scores is being calculated. Also in detecting when the copy next operation was the most likely (over generating and copying) and filling in the copy next token for these timesteps.

Does this sound right?

@epwalsh
Copy link
Member Author

epwalsh commented Mar 15, 2021

Ah, yes you're right. Their implementation actually seems a lot simpler than CopyNet. And I think this is because they don't worry about the case where tokens in the source sequence may be in the target vocabulary ($\mathcal{L}$), which makes sense for the task they applied it to.

So I think we have a couple options:

  1. Continue with the original plan, doing a simple modification to CopyNet. This doesn't give us the CopyNext model, but it does probably improve CopyNet, and is may also be a better model than CopyNext for tasks where some source tokens may be in the target vocab.
  2. Implement the true CopyNext model from scratch.
  3. Do both 1 and 2 🤷‍♂️

What do you think?

@JohnGiorgi
Copy link
Contributor

What if I go for 1. right now? I think I could do this inside a week. I could test it on my own task, joint NER and RE, that I currently use CopyNet for. I can see if there's a performance boost.

I think I would still be interested in trying out CopyNext, so I could attempt to implement it from scratch as well. This might take me a little longer though. The authors list a GitHub in the paper (https://github.com/abhinonymous/copynext), but it gives me a 404. I will reach out and see if there's a public implementation somewhere.

@epwalsh
Copy link
Member Author

epwalsh commented Mar 15, 2021

Sounds good @JohnGiorgi!

@david-waterworth
Copy link

@JohnGiorgi / @epwalsh I'm interested in CopyNext as well, I may be able to assist with 2 once I do a bit more research into what's involved.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Contributions welcome hard Difficult tasks Models Issues related to the allennlp-models repo
Projects
None yet
Development

No branches or pull requests

4 participants