Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] load dataset #254

Merged
merged 31 commits into from
Jun 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
7fbe207
[Major] refactor imports
huyiwen Jun 6, 2024
5f2303d
[CI] update tests
huyiwen Jun 6, 2024
56910ea
[Feat] add hfd and hf-mirror
huyiwen Jun 6, 2024
0938fb8
[doc] more guide on loading datsets
huyiwen Jun 6, 2024
184b607
[fix] update hfd
huyiwen Jun 6, 2024
c92ce82
[fix] dataset formatting
huyiwen Jun 6, 2024
c46aea2
[fix] load with hfd
huyiwen Jun 6, 2024
d020f69
[fix] resolve huggingface-cli import error
huyiwen Jun 6, 2024
64f27ed
[CI] test dataset formatting
huyiwen Jun 6, 2024
7e13d51
[CI] skip OOM
huyiwen Jun 6, 2024
83e5fca
[fix] fix failed tests
huyiwen Jun 6, 2024
a12e4dc
support gpt-4o
huyiwen Jun 6, 2024
23fdcd5
update customize dataset
huyiwen Jun 6, 2024
5b6178a
[doc] customize model
huyiwen Jun 6, 2024
e34fffe
[CI] annotate failures
huyiwen Jun 6, 2024
6bf270b
[Feat] load evaluation_data
huyiwen Jun 6, 2024
e84df5a
[Feat] hfd_cache_path
huyiwen Jun 6, 2024
b3d2711
[CI] split pytest
huyiwen Jun 6, 2024
f1d4d10
[CI] fix splits
huyiwen Jun 6, 2024
224a9bb
[CI] skip cuda
huyiwen Jun 6, 2024
dd8b675
[doc] add CONTRIBUTING.md
huyiwen Jun 6, 2024
70eb5d3
[fix] evaluation_data is not None
huyiwen Jun 6, 2024
0d11d34
[CI] download nltk
huyiwen Jun 6, 2024
eb4d8ed
[CI] fix temp folder
huyiwen Jun 6, 2024
7e049d1
[CI] fix cache path
huyiwen Jun 6, 2024
a381264
[CI] skip DatasetGenerationError
huyiwen Jun 6, 2024
589585e
[CI] re-run failures
huyiwen Jun 6, 2024
c2b3ca0
[ci] fix winograd
huyiwen Jun 6, 2024
085df89
[CI] fix pytest-results-action
huyiwen Jun 6, 2024
1906b8d
[CI] fix
huyiwen Jun 6, 2024
1314c4d
[ci] fix xlsum
huyiwen Jun 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/isort-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ on:
- 'utilization/**'

jobs:
build:
formatting-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
Expand Down
80 changes: 64 additions & 16 deletions .github/workflows/pytest-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,30 +9,78 @@ on:
- '.github/workflows/**'

jobs:
build:
name: Run tests
Pytest:
name: subtest
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.8.18"]
group: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

steps:
- uses: szenius/set-timezone@v1.2
- uses: szenius/set-timezone@v2.0
with:
timezoneLinux: "Europe/Berlin"
timezoneLinux: "Asia/Shanghai"
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
- name: Set up Python 3.8.18
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install uv
run: pip install uv pip -U
python-version: 3.8.18
- name: Install dependencies
run: uv pip install -r tests/requirements-tests.txt --system
- name: Install isolation dependencies
run: uv pip install vllm --no-build-isolation --system
- uses: pavelzw/pytest-action@v2
run: |
pip install uv pip -U
uv pip install -r tests/requirements-tests.txt --system
uv pip install vllm --no-build-isolation --system
- name: Run tests
run: pytest --cov --junit-xml=test-results.xml --splits 10 --group ${{ matrix.group }} --reruns 3 --only-rerun PermissionError
env:
GITHUB_ACTION: 1
- name: Surface failing tests
if: always()
uses: pmeier/pytest-results-action@multi-testsuites
with:
emoji: false
verbose: true
job-summary: true
# A list of JUnit XML files, directories containing the former, and wildcard
# patterns to process.
# See @actions/glob for supported patterns.
path: test-results.xml

# (Optional) Add a summary of the results at the top of the report
summary: true

# (Optional) Select which results should be included in the report.
# Follows the same syntax as `pytest -r`
display-options: fEX

# (Optional) Fail the workflow if no JUnit XML was found.
fail-on-empty: true

# (Optional) Title of the test results section in the workflow summary
title: Test results
- name: Upload coverage
uses: actions/upload-artifact@v2
with:
name: coverage${{ matrix.group }}
path: .coverage

Coverage:
needs: Pytest
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python 3.8.18
uses: actions/setup-python@v4
with:
python-version: 3.8.18
- name: Install uv
run: |
pip install uv pip -U
uv pip install -r tests/requirements-tests.txt --system
uv pip install vllm --no-build-isolation --system
- name: Download all artifacts
# Downloads coverage1, coverage2, etc.
uses: actions/download-artifact@v2
- name: Run coverage
run: |
coverage combine coverage*/.coverage*
coverage report --fail-under=90
coverage xml
- uses: codecov/codecov-action@v1
82 changes: 82 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Contributing

Thanks for your interest in contributing to LLMBox! We welcome and appreciate contributions.
To report bugs, create a [GitHub issue](https://github.com/RUCAIBox/LLMBox/issues).

## Contribution Guide
### 1. Fork the Official Repository

Fork [LLMBox repository](https://github.com/RUCAIBox/LLMBox) into your own account.
Clone your own forked repository into your local environment.

```shell
git clone [email protected]:<YOUR-USERNAME>/LLMBox.git
```

### 2. Configure Git

Set the official repository as your [upstream](https://www.atlassian.com/git/tutorials/git-forks-and-upstreams) to synchronize with the latest update in the official repository.
Add the original repository as upstream

```shell
cd LLMBox
git remote add upstream [email protected]:RUCAIBox/LLMBox.git
```

Verify that the remote is set.
```shell
git remote -v
```
You should see both `origin` and `upstream` in the output.

### 3. Synchronize with Official Repository
Synchronize latest commit with official repository before coding.

```shell
git fetch upstream
git checkout main
git merge upstream/main
git push origin main
```

### 4. Create a New Branch And Open a Pull Request
After you finish implementation, open forked repository. The source branch is your new branch, and the target branch is `RUCAIBox/LLMBox` `main` branch. Then PR should appears in [LLMBox PRs](https://github.com/RUCAIBox/LLMBox/pulls).

Then LLMBox team will review your code.

## PR Rules

### 1. Pull Request title

As described in [here](https://github.com/commitizen/conventional-commit-types/blob/master/index.json), a valid PR title should begin with one of the following prefixes:

- `feat`: A new feature
- `fix`: A bug fix
- `doc`: Documentation only changes
- `refactor`: A code change that neither fixes a bug nor adds a feature
- `style`: A refactoring that improves code style
- `test`: Adding missing tests or correcting existing tests
- `ci`: Changes to CI configuration files and scripts (example scopes: `.github`, `ci` (Buildkite))
- `revert`: Reverts a previous commit

For example, a PR title could be:
- `refactor: modify package path`
- `feat(training): xxxx`, where `(training)` means that this PR mainly focuses on the training component.

You may also check out previous PRs in the [PR list](https://github.com/RUCAIBox/LLMBox/pulls).

### 2. Pull Request description

- If your PR is small (such as a typo fix), you can go brief.
- If it is large and you have changed a lot, it's better to write more details.


## How to begin
Please refer to the README in each module:
- [training](./training)
- [utilization](./utilization)
- [docs](./docs)

## Tests
Please navigate to `tests` folder to see existing test suites.
At the moment, we have three kinds of tests: `pytest`, `isort`, and `yapf`.
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ bash bash/run_7b_ds3.sh
To utilize your model, or evaluate an existing model, you can run the following command:

```python
python inference.py -m gpt-3.5-turbo -d copa # --num_shot 0 --model_type instruction
python inference.py -m gpt-3.5-turbo -d copa # --num_shot 0 --model_type chat
```

This is default to run the OpenAI GPT 3.5 turbo model on the CoPA dataset in a zero-shot manner.
Expand Down Expand Up @@ -118,12 +118,11 @@ We provide a broad support on Huggingface models (e.g. `LLaMA-3`, `Mistral`, or
Currently a total of 56+ commonly used datasets are supported, including: `HellaSwag`, `MMLU`, `GSM8K`, `GPQA`, `AGIEval`, `CEval`, and `CMMLU`. For a full list of supported models and datasets, view the [utilization](https://github.com/RUCAIBox/LLMBox/tree/main/utilization) documentation.

```bash
python inference.py \
CUDA_VISIBLE_DEVICES=0 python inference.py \
-m llama-2-7b-hf \
-d mmlu agieval:[English] \
--model_type instruction \
--model_type chat \
--num_shot 5 \
--cuda 0 \
--ranking_type ppl_no_option
```

Expand Down
47 changes: 47 additions & 0 deletions docs/examples/customize_dataset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
import os
import sys

sys.path.append(".")
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

from utilization import DatasetArguments, ModelArguments, get_evaluator, register_dataset
from utilization.dataset import GenerationDataset


@register_dataset(name="my_data")
class MyData(GenerationDataset):

instruction = "Reply to my message: {input}\nReply:"
metrics = []

def format_instance(self, instance: dict) -> dict:
return instance

@property
def references(self):
return [i["target"] for i in self.evaluation_data]


evaluator = get_evaluator(
model_args=ModelArguments(model_name_or_path="gpt-4o"),
dataset_args=DatasetArguments(
dataset_names=["my_data"],
num_shots=1,
max_example_tokens=2560,
),
evaluation_data=[
{
"input": "Hello",
"target": "Hi"
},
{
"input": "How are you?",
"target": "I'm fine, thank you!"
},
],
example_data=[{
"input": "What's the weather like today?",
"target": "It's sunny today."
}]
)
evaluator.evaluate()
10 changes: 6 additions & 4 deletions docs/examples/customize_huggingface_model.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
import sys

import torch
from transformers import LlamaForCausalLM

from utilization import Evaluator
from utilization.model.huggingface_model import get_model_max_length, load_tokenizer
from utilization.utils import DatasetArguments, ModelArguments
sys.path.append(".")
from utilization import DatasetArguments, ModelArguments, get_evaluator


def load_hf_model(model_args: ModelArguments):
from utilization.model.huggingface_model import get_model_max_length, load_tokenizer

# load your own model
model = LlamaForCausalLM.from_pretrained(
Expand All @@ -24,7 +26,7 @@ def load_hf_model(model_args: ModelArguments):
return model, tokenizer


evaluator = Evaluator(
evaluator = get_evaluator(
model_args=ModelArguments(
model_name_or_path="../your-model-path",
model_type="chat",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

If you find some datasets are not supported in the current version, feel free to implement your own dataset and submit a PR.

See a full list of supported datasets at [here](https://github.com/RUCAIBox/LLMBox/tree/main/docs/utilization/supported-datasets.md).

## Choose the Right Dataset

We provide two types of datasets: [`GenerationDataset`](https://github.com/RUCAIBox/LLMBox/tree/main/utilization/dataset/generation_dataset.py) and [`MultipleChoiceDataset`](https://github.com/RUCAIBox/LLMBox/tree/main/utilization/dataset/multiple_choice_dataset.py).
Expand Down Expand Up @@ -35,7 +37,7 @@ These are the attributes you can define in a new dataset:

- `example_set` (`Optional[str]`): The example split of dataset. Example data will be automatically loaded if this is not None.

- `load_args` (`Union[Tuple[str], Tuple[str, str], Tuple[()]]`, **required\***): Arguments for loading the dataset with huggingface `load_dataset`. See [load from source data](https://github.com/RUCAIBox/LLMBox/tree/main/docs/utilization/customize-dataset.md#load-from-source-data) for details.
- `load_args` (`Union[Tuple[str], Tuple[str, str], Tuple[()]]`, **required\***): Arguments for loading the dataset with huggingface `load_dataset`. See [load from source data](https://github.com/RUCAIBox/LLMBox/tree/main/docs/utilization/how-to-customize-dataset.md#load-from-source-data) for details.

- `extra_model_args` (`Dict[str, Any]`): Extra arguments for the model like `temperature`, `stop` etc. See `set_generation_args`, `set_prob_args`, and `set_ppl_args` for details.

Expand All @@ -45,7 +47,7 @@ Then implement the following methods or properties:
- `references` (**required**): Return the reference answers for evaluation.
- `init_arguments`: Initialize the arguments for the dataset. This is called before the raw dataset is loaded.

See [here](https://github.com/RUCAIBox/LLMBox/tree/main/docs/utilization/customize-dataset.md#advanced-topics) for advanced topics.
See [here](https://github.com/RUCAIBox/LLMBox/tree/main/docs/utilization/how-to-customize-dataset.md#advanced-topics) for advanced topics.


## Load from Source Data
Expand Down
28 changes: 28 additions & 0 deletions docs/utilization/how-to-customize-model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# How to Customize Model

## Customizing HuggingFace Models

If you are building on your own model, such as using a fine-tuned model, you can evaluate it easily from python script. Detailed steps and example code are provided in the [customize HuggingFace model guide](https://github.com/RUCAIBox/LLMBox/tree/main/docs/examples/customize_huggingface_model.py).

## Adding a New Model Provider

If you're integrating a new model provider, begin by extending the [`Model`](https://github.com/RUCAIBox/LLMBox/tree/main/utilization/model/model.py) class. Implement essential methods such as `generation`, `get_ppl` (get perplexity), and `get_prob` (get probability) to support different functionalities. For instance, here's how you might implement the `generation` method for a new model:

```python
class NewModel(Model):

model_backend = "new_provider"

def call_model(self, batched_inputs: List[str]) -> List[Any]:
return ... # call to model, e.g., self.model.generate(...)

def to_text(self, result: Any) -> str:
return ... # convert result to text, e.g., result['text']

def generation(self, batched_inputs: List[str]) -> List[str]:
results = self.call_model(batched_inputs)
results = [to_text(result) for result in results]
return results
```

And then, you should register your model in the [`load`](https://github.com/RUCAIBox/LLMBox/tree/main/utilization/model/load.py) file.
Loading
Loading