More Chat Loss Masking Strategies #2214

EugenHotaj · 2024-12-30T16:24:59Z

Are there plans to add more loss masking strategies for chat data?

E.g. a very common loss masking strategy for multi-turn conversations is to mask everything but the last assistant response. However, train_on_input=False right now will compute the loss on all assistant turns, not just the last one. Is it possible to add this feature to torchtune?

The text was updated successfully, but these errors were encountered:

RdoubleA · 2024-12-30T23:27:46Z

If you are using a custom dataset with a custom message transform, you can manually mask the messages you need to in the transform by setting the masked field in the Message dataclass. If you are using one of the dataset builders, you're right that this is not currently possible. These are designed to be easily configurable from yaml so something more flexible like a loss mask list of booleans is a bit tougher. But if this is a common approach and other folks would like this for the built in dataset builders then we could consider something like changing train_on_input to a string masking strategy parameter, or something similar.

RdoubleA · 2025-01-01T01:58:24Z

I just saw a similar request in #2207, so this might be worth enabling

EugenHotaj · 2025-01-02T21:59:56Z

Nice to "see" you again Rafi! Thanks for the quick response.

But if this is a common approach and other folks would like this for the built in dataset builders then we could consider something like changing train_on_input to a string masking strategy parameter, or something similar.

I just saw a similar request in #2207, so this might be worth enabling

Masking the last turn only is a very (most?) common masking strategy so could be a nice feature to provide users out of the box.

If you are using a custom dataset with a custom message transform, you can manually mask the messages you need to in the transform by setting the masked field in the Message dataclass.

Any pointers / examples for how to do this?

RdoubleA · 2025-01-02T22:06:30Z

Glad to see you on the torchtune repo Eugen :)

Yes, see this page for an example.

If your conversation is stored in a column, you can just query that column in the custom message transform and manually create Message objects for the whole conversation, leaving the last one unmasked. Then you'll need to make a custom dataset builder that you can specify in your config.

Let me know if there's any confusion on this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More Chat Loss Masking Strategies #2214

More Chat Loss Masking Strategies #2214

EugenHotaj commented Dec 30, 2024

RdoubleA commented Dec 30, 2024

RdoubleA commented Jan 1, 2025

EugenHotaj commented Jan 2, 2025

RdoubleA commented Jan 2, 2025

More Chat Loss Masking Strategies #2214

More Chat Loss Masking Strategies #2214

Comments

EugenHotaj commented Dec 30, 2024

RdoubleA commented Dec 30, 2024

RdoubleA commented Jan 1, 2025

EugenHotaj commented Jan 2, 2025

RdoubleA commented Jan 2, 2025