We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
- `llamafactory` version: 0.9.2.dev0 - Platform: Linux-5.4.0-124-generic-x86_64-with-glibc2.31 - Python version: 3.10.15 - PyTorch version: 2.5.1+cu124 (GPU) - Transformers version: 4.46.1 - Datasets version: 3.1.0 - Accelerate version: 1.0.1 - PEFT version: 0.12.0 - TRL version: 0.9.6 - GPU type: NVIDIA GeForce RTX 3090 - DeepSpeed version: 0.15.4 - vLLM version: 0.6.4.post1
配置文件:
dataset: evol_instruct_zh_gpt4,identity,belle_1k max_samples: 9000,1000,1000
参数文档上说max_samples参数用于指定每个数据集的最大样本数量,使用逗号分隔。 但我用上面的配置会报错:
max_samples
指定每个数据集的最大样本数量,使用逗号分隔。
[rank2]: max_samples = min(data_args.max_samples, len(dataset)) [rank2]: TypeError: '<' not supported between instances of 'int' and 'str'
定位到代码这个地方:
def _load_single_dataset(...): ... if data_args.max_samples is not None: # truncate dataset max_samples = min(data_args.max_samples, len(dataset)) dataset = dataset.select(range(max_samples))
No response
The text was updated successfully, but these errors were encountered:
使用 num_samples 而非 max_samples https://github.com/hiyouga/LLaMA-Factory/blob/main/data/README_zh.md
Sorry, something went wrong.
No branches or pull requests
Reminder
System Info
Reproduction
配置文件:
Expected behavior
参数文档上说
max_samples
参数用于指定每个数据集的最大样本数量,使用逗号分隔。
但我用上面的配置会报错:
定位到代码这个地方:
Others
No response
The text was updated successfully, but these errors were encountered: