Skip to content

Liu-yuliang/LLM_ability_eval

Repository files navigation

1. Commensense QA -- 1 dataset

a. PIQA

valid set, 1838 samples, test use 3mins 35s

2. Code -- 1 datasets

a. Humaneval

pass@1 no implement exec

3. MATH -- 1 datasets

a. GSM8k

8-shot, examples are random selected from testset. done

testing use 1 hour on 1 80G A800

4. MMLU -- 1 dataset

5-shot -- done, 14042 samples,

testing use 40mins on 1 80G A800

5. BookSUM -- 1 dataset

not a basic ability

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published