We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OPENAI 报错 API Error: Status Code 400, {"object":"error","message":"Only allowed now, your model Qwen2-7B","code":40301}
API Error: Status Code 400, {"object":"error","message":"Only allowed now, your model Qwen2-7B","code":40301}
docker部署。
--network host \ -v /data/data/Qwen2.5-3B-Instruct:/workspace/qwen/Qwen2.5-3B-Instruct \ -v /data/data/config_qwen_v20_7b.json:/workspace/config_qwen_v20_7b.json \ dockerpull.com/dashinfer/fschat_ubuntu_x86:v1.2.1 \ -m /workspace/qwen/Qwen2.5-3B-Instruct \ /workspace/config_qwen_v20_7b.json
"model_name": "Qwen2-7B", "model_type": "Qwen_v20", "model_path": "~/dashinfer_models/", "data_type": "float16", "device_type": "CPU", "device_ids": [ 0 ], "multinode_mode": false, "engine_config": { "engine_max_length": 2048, "engine_max_batch": 8, "do_profiling": false, "num_threads": 0, "matmul_precision": "highest" }, "generation_config": { "temperature": 0.7, "early_stopping": true, "top_k": 20, "top_p": 0.8, "repetition_penalty": 1.05, "presence_penalty": 0.0, "min_length": 0, "max_length": 2048, "no_repeat_ngram_size": 0, "eos_token_id": 151643, "seed": 1234, "stop_words_ids": [ [ 151643 ], [ 151644 ], [ 151645 ] ] }, "convert_config": { "do_dynamic_quantize_convert": false }, "quantization_config": { "activation_type": "float16", "weight_type": "uint8", "SubChannel": true, "GroupSize": 512 } }
启动日志:
using config file: /workspace/config_qwen_v20_7b.json 2024-10-15 09:58:00 | INFO | controller | args: Namespace(dispatch_method='shortest_queue', host='localhost', port=21001, ssl=False) 2024-10-15 09:58:00 | ERROR | stderr | INFO: Started server process [16] 2024-10-15 09:58:00 | ERROR | stderr | INFO: Waiting for application startup. 2024-10-15 09:58:00 | ERROR | stderr | INFO: Application startup complete. 2024-10-15 09:58:00 | ERROR | stderr | INFO: Uvicorn running on http://localhost:21001 (Press CTRL+C to quit) 2024-10-15 09:58:00 | INFO | openai_api_server | args: Namespace(allow_credentials=False, allowed_headers=['*'], allowed_methods=['*'], allowed_origins=['*'], api_keys=None, controller_address='http://localhost:21001', host='localhost', port=8000, ssl=False) 2024-10-15 09:58:00 | ERROR | stderr | INFO: Started server process [17] 2024-10-15 09:58:00 | ERROR | stderr | INFO: Waiting for application startup. 2024-10-15 09:58:00 | ERROR | stderr | INFO: Application startup complete. 2024-10-15 09:58:00 | ERROR | stderr | INFO: Uvicorn running on http://localhost:8000 (Press CTRL+C to quit) 2024-10-15 09:58:04 | INFO | model_worker | Loading the model ['Qwen2.5-3B-Instruct'] on worker 01dbdd5b, worker type: dash-infer worker... 2024-10-15 09:58:04 | INFO | stdout | ### convert_config: {'do_dynamic_quantize_convert': False} 2024-10-15 09:58:04 | INFO | stdout | ### engine_config: {'engine_max_length': 2048, 'engine_max_batch': 8, 'do_profiling': False, 'num_threads': 0, 'matmul_precision': 'highest'} WARNING: Logging before InitGoogleLogging() is written to STDERR I20241015 09:58:04.585773 18 thread_pool.h:46] ThreadPool created with: 1 I20241015 09:58:04.586000 18 as_engine.cpp:233] AllSpark Init with Version: 1.2.1/(GitSha1:5ceddf95) Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. E20241015 09:58:04.916028 18 as_engine.cpp:931] workers is empty Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 50%|█████ | 1/2 [00:01<00:01, 1.01s/it] Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00, 1.31it/s] Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00, 1.25it/s] 2024-10-15 09:58:06 | ERROR | stderr | 2024-10-15 09:58:08 | INFO | stdout | trans model from huggingface model: /workspace/qwen/Qwen2.5-3B-Instruct 2024-10-15 09:58:08 | INFO | stdout | Dashinfer model will save to /root/dashinfer_models/ 2024-10-15 09:58:08 | INFO | stdout | ### model_config: {'vocab_size': 151936, 'max_position_embeddings': 32768, 'hidden_size': 2048, 'intermediate_size': 11008, 'num_hidden_layers': 36, 'num_attention_heads': 16, 'use_sliding_window': False, 'sliding_window': 32768, 'max_window_layers': 70, 'num_key_value_heads': 2, 'hidden_act': 'silu', 'initializer_range': 0.02, 'rms_norm_eps': 1e-06, 'use_cache': True, 'rope_theta': 1000000.0, 'attention_dropout': 0.0, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': torch.bfloat16, 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': ['Qwen2ForCausalLM'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 151643, 'pad_token_id': None, 'eos_token_id': 151645, 'sep_token_id': None, 'decoder_start_token_id': None, 'task_specific_params': None, 'problem_type': None, '_name_or_path': '/workspace/qwen/Qwen2.5-3B-Instruct', '_commit_hash': None, '_attn_implementation_internal': 'sdpa', 'transformers_version': '4.43.1', 'model_type': 'qwen2', 'use_dynamic_ntk': False, 'use_logn_attn': False, 'rotary_emb_base': 1000000.0, 'size_per_head': 128} 2024-10-15 09:58:08 | INFO | stdout | save dimodel to /root/dashinfer_models/Qwen2-7B_cpu_single_float16.dimodel 2024-10-15 09:58:08 | INFO | stdout | save ditensors to /root/dashinfer_models/Qwen2-7B_cpu_single_float16.ditensors 2024-10-15 09:58:16 | INFO | stdout | parse weight time: 8.026057004928589 2024-10-15 09:58:16 | INFO | stdout | current allspark version major[ 1 ] minor[ 2 ] patch[ 1 ] commit = 5ceddf95 2024-10-15 09:58:16 | INFO | stdout | calculate md5 of dimodel = b51d97a3e0a163de5f6123f7ad0fd77e 2024-10-15 09:58:16 | INFO | stdout | torch build meta: model_name : Qwen2-7B_cpu_single_float16 2024-10-15 09:58:16 | INFO | stdout | torch build meta: model_type : Qwen_v20 2024-10-15 09:58:16 | INFO | stdout | torch build meta: save_dir : /root/dashinfer_models/ 2024-10-15 09:58:16 | INFO | stdout | torch build meta: multinode_mode : False 2024-10-15 09:58:16 | INFO | stdout | torch build meta: data_type : float16 2024-10-15 09:58:16 | INFO | stdout | torch build meta: do_dynamic_quantize_convert : False 2024-10-15 09:58:16 | INFO | stdout | torch build meta: use_dynamic_ntk : False 2024-10-15 09:58:16 | INFO | stdout | torch build meta: use_logn_attn : False 2024-10-15 09:58:16 | INFO | stdout | torch build meta: model_sequence_length : 2048 2024-10-15 09:58:16 | INFO | stdout | torch build meta: seqlen_extrapolation : 1.0 2024-10-15 09:58:16 | INFO | stdout | torch build meta: rotary_base : 1000000.0 2024-10-15 09:58:16 | INFO | stdout | serialize_model_from_torch: save model = true, time : 8.104979991912842 2024-10-15 09:58:16 | INFO | stdout | convert model from HF finished, build time is 8.105656862258911 seconds I20241015 09:58:16.746803 18 as_engine.cpp:366] Detect avx512f supported, switch Prefill mode to flash I20241015 09:58:16.746842 18 as_engine.cpp:384] Build model use following config: AsModelConfig : model_name: Qwen2-7B_cpu_single_float16 model_path: /root/dashinfer_models/Qwen2-7B_cpu_single_float16.dimodel weights_path: /root/dashinfer_models/Qwen2-7B_cpu_single_float16.ditensors compute_unit: CPU:0 num_threads: 12 matmul_precision: highest prefill_mode: AsPrefillFlashV2 cache_mode: AsCacheDefault engine_max_length = 2048 engine_max_batch = 8 I20241015 09:58:16.746910 18 as_engine.cpp:388] Load model from : /root/dashinfer_models/Qwen2-7B_cpu_single_float16.dimodel I20241015 09:58:16.747004 18 as_engine.cpp:300] SetDeviceIds: DeviceIDs.size() 1 I20241015 09:58:16.747017 18 as_engine.cpp:307] Start create 1 Device: CPU workers. I20241015 09:58:16.747486 215 cpu_context.cpp:114] CPUContext::InitMCCL() rank: 0 nRanks: 1 I20241015 09:58:16.827616 18 as_param_check.hpp:342] AsParamGuard check level = CHECKER_NORMAL. Engine version = 1.2 . Weight version = 1.2 . I20241015 09:58:16.829321 18 as_engine.cpp:511] Start BuildModel I20241015 09:58:16.829511 216 as_engine.cpp:521] Start Build model for rank: 0 I20241015 09:58:16.829562 216 weight_manager.cpp:131] Start Loading weight for model RankInfo[0/1] I20241015 09:58:16.829576 216 weight_manager.cpp:52] Start open model file /root/dashinfer_models/Qwen2-7B_cpu_single_float16.ditensors I20241015 09:58:16.829613 216 weight_manager.cpp:59] Open model file success. I20241015 09:58:16.832871 216 weight_manager.cpp:107] Weight file header parse success...291 weight tensors are going to load. I20241015 09:58:21.656690 216 weight_manager.cpp:257] finish weight load for model RankInfo[0/1] time spend: 4.827 seconds. I20241015 09:58:21.659478 216 as_engine.cpp:525] Finish Build model for rank: 0 2024-10-15 09:58:21 | INFO | stdout | build model over, build time is 5.124737977981567 I20241015 09:58:21.661041 18 as_engine.cpp:672] StartModel: warming up... I20241015 09:58:21.661065 217 as_engine.cpp:1612] | AllsparkStat | Req: Running: 0 Pending: 0 Prompt: 0 T/s Gen: 0 T/s 2024-10-15 10:02:14 | INFO | stdout | INFO: 127.0.0.1:36730 - "POST /list_models HTTP/1.1" 200 OK 2024-10-15 10:02:14 | INFO | stdout | INFO: 127.0.0.1:37592 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request 2024-10-15 10:02:15 | INFO | stdout | INFO: 127.0.0.1:36736 - "POST /list_models HTTP/1.1" 200 OK 2024-10-15 10:02:15 | INFO | stdout | INFO: 127.0.0.1:37592 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request 2024-10-15 10:02:16 | INFO | stdout | INFO: 127.0.0.1:36744 - "POST /list_models HTTP/1.1" 200 OK 2024-10-15 10:02:16 | INFO | stdout | INFO: 127.0.0.1:37592 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request 2024-10-15 10:02:16 | INFO | stdout | INFO: 127.0.0.1:36746 - "POST /list_models HTTP/1.1" 200 OK 2024-10-15 10:02:16 | INFO | stdout | INFO: 127.0.0.1:37592 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request 2024-10-15 10:02:24 | INFO | stdout | INFO: 127.0.0.1:60596 - "POST /list_models HTTP/1.1" 200 OK 2024-10-15 10:02:24 | INFO | stdout | INFO: 127.0.0.1:46342 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request 2024-10-15 10:02:25 | INFO | stdout | INFO: 127.0.0.1:60608 - "POST /list_models HTTP/1.1" 200 OK 2024-10-15 10:02:25 | INFO | stdout | INFO: 127.0.0.1:46342 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request 2024-10-15 10:02:26 | INFO | stdout | INFO: 127.0.0.1:60624 - "POST /list_models HTTP/1.1" 200 OK 2024-10-15 10:02:26 | INFO | stdout | INFO: 127.0.0.1:46342 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request 2024-10-15 10:02:26 | INFO | stdout | INFO: 127.0.0.1:60632 - "POST /list_models HTTP/1.1" 200 OK 2024-10-15 10:02:26 | INFO | stdout | INFO: 127.0.0.1:46342 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
The text was updated successfully, but these errors were encountered:
你的模型名称是Qwen2.5-3B-Instruct,所以要把"model_name": "Qwen2-7B",这里的model_name改成wen2.5-3B-Instruct
Sorry, something went wrong.
No branches or pull requests
OPENAI 报错
API Error: Status Code 400, {"object":"error","message":"Only allowed now, your model Qwen2-7B","code":40301}
docker部署。
启动日志:
The text was updated successfully, but these errors were encountered: