How to evaluate ppl? #77

Jiawei-Yang · 2024-03-13T23:59:33Z

Hi, thanks for your amazing work!

I noticed that there's an evaluation script for perplexity. I wonder how to replicate the results in tables 1 and 2.
Are there any instructions for this?

seeyourcell · 2024-05-25T06:42:28Z

same question

Zhuohao-Li · 2024-09-17T00:19:25Z

i think just run CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python examples/eval_long_ppl.py --model_name_or_path lmsys/vicuna-13b-v1.3 should work.

but i still met some problems when running, it produces RuntimeError: stack expects a non-empty TensorList for ppl = torch.exp(torch.stack(nlls).mean()) in eval_long_ppl.py. it seems to have some bugs still in the impl.

hope that helps and I also look forward to the eval code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to evaluate ppl? #77

How to evaluate ppl? #77

Jiawei-Yang commented Mar 13, 2024

seeyourcell commented May 25, 2024

Zhuohao-Li commented Sep 17, 2024

How to evaluate ppl? #77

How to evaluate ppl? #77

Comments

Jiawei-Yang commented Mar 13, 2024

seeyourcell commented May 25, 2024

Zhuohao-Li commented Sep 17, 2024