Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用千问2.0-7B加载千问2.5-3B模型提示"Only allowed now, your model Qwen2-7B" #42

Open
tianyouyangying opened this issue Oct 15, 2024 · 1 comment

Comments

@tianyouyangying
Copy link

OPENAI 报错
API Error: Status Code 400, {"object":"error","message":"Only allowed now, your model Qwen2-7B","code":40301}

docker部署。

    --network host \
    -v /data/data/Qwen2.5-3B-Instruct:/workspace/qwen/Qwen2.5-3B-Instruct  \
    -v /data/data/config_qwen_v20_7b.json:/workspace/config_qwen_v20_7b.json \
    dockerpull.com/dashinfer/fschat_ubuntu_x86:v1.2.1 \
    -m /workspace/qwen/Qwen2.5-3B-Instruct \
    /workspace/config_qwen_v20_7b.json
    "model_name": "Qwen2-7B",
    "model_type": "Qwen_v20",
    "model_path": "~/dashinfer_models/",
    "data_type": "float16",
    "device_type": "CPU",
    "device_ids": [
        0
    ],
    "multinode_mode": false,
    "engine_config": {
        "engine_max_length": 2048,
        "engine_max_batch": 8,
        "do_profiling": false,
        "num_threads": 0,
        "matmul_precision": "highest"
    },
    "generation_config": {
        "temperature": 0.7,
        "early_stopping": true,
        "top_k": 20,
        "top_p": 0.8,
        "repetition_penalty": 1.05,
        "presence_penalty": 0.0,
        "min_length": 0,
        "max_length": 2048,
        "no_repeat_ngram_size": 0,
        "eos_token_id": 151643,
        "seed": 1234,
        "stop_words_ids": [
            [
                151643
            ],
            [
                151644
            ],
            [
                151645
            ]
        ]
    },
    "convert_config": {
        "do_dynamic_quantize_convert": false
    },
    "quantization_config": {
        "activation_type": "float16",
        "weight_type": "uint8",
        "SubChannel": true,
        "GroupSize": 512
    }
}

启动日志:

using config file: /workspace/config_qwen_v20_7b.json
2024-10-15 09:58:00 | INFO | controller | args: Namespace(dispatch_method='shortest_queue', host='localhost', port=21001, ssl=False)
2024-10-15 09:58:00 | ERROR | stderr | INFO:     Started server process [16]
2024-10-15 09:58:00 | ERROR | stderr | INFO:     Waiting for application startup.
2024-10-15 09:58:00 | ERROR | stderr | INFO:     Application startup complete.
2024-10-15 09:58:00 | ERROR | stderr | INFO:     Uvicorn running on http://localhost:21001 (Press CTRL+C to quit)
2024-10-15 09:58:00 | INFO | openai_api_server | args: Namespace(allow_credentials=False, allowed_headers=['*'], allowed_methods=['*'], allowed_origins=['*'], api_keys=None, controller_address='http://localhost:21001', host='localhost', port=8000, ssl=False)
2024-10-15 09:58:00 | ERROR | stderr | INFO:     Started server process [17]
2024-10-15 09:58:00 | ERROR | stderr | INFO:     Waiting for application startup.
2024-10-15 09:58:00 | ERROR | stderr | INFO:     Application startup complete.
2024-10-15 09:58:00 | ERROR | stderr | INFO:     Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)
2024-10-15 09:58:04 | INFO | model_worker | Loading the model ['Qwen2.5-3B-Instruct'] on worker 01dbdd5b, worker type: dash-infer worker...
2024-10-15 09:58:04 | INFO | stdout | ### convert_config: {'do_dynamic_quantize_convert': False}
2024-10-15 09:58:04 | INFO | stdout | ### engine_config: {'engine_max_length': 2048, 'engine_max_batch': 8, 'do_profiling': False, 'num_threads': 0, 'matmul_precision': 'highest'}
WARNING: Logging before InitGoogleLogging() is written to STDERR
I20241015 09:58:04.585773    18 thread_pool.h:46] ThreadPool created with: 1
I20241015 09:58:04.586000    18 as_engine.cpp:233] AllSpark Init with Version: 1.2.1/(GitSha1:5ceddf95)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
E20241015 09:58:04.916028    18 as_engine.cpp:931] workers is empty
Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards:  50%|█████     | 1/2 [00:01<00:01,  1.01s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.31it/s]
Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.25it/s]
2024-10-15 09:58:06 | ERROR | stderr | 
2024-10-15 09:58:08 | INFO | stdout | trans model from huggingface model: /workspace/qwen/Qwen2.5-3B-Instruct
2024-10-15 09:58:08 | INFO | stdout | Dashinfer model will save to  /root/dashinfer_models/
2024-10-15 09:58:08 | INFO | stdout | ### model_config: {'vocab_size': 151936, 'max_position_embeddings': 32768, 'hidden_size': 2048, 'intermediate_size': 11008, 'num_hidden_layers': 36, 'num_attention_heads': 16, 'use_sliding_window': False, 'sliding_window': 32768, 'max_window_layers': 70, 'num_key_value_heads': 2, 'hidden_act': 'silu', 'initializer_range': 0.02, 'rms_norm_eps': 1e-06, 'use_cache': True, 'rope_theta': 1000000.0, 'attention_dropout': 0.0, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': torch.bfloat16, 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': ['Qwen2ForCausalLM'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 151643, 'pad_token_id': None, 'eos_token_id': 151645, 'sep_token_id': None, 'decoder_start_token_id': None, 'task_specific_params': None, 'problem_type': None, '_name_or_path': '/workspace/qwen/Qwen2.5-3B-Instruct', '_commit_hash': None, '_attn_implementation_internal': 'sdpa', 'transformers_version': '4.43.1', 'model_type': 'qwen2', 'use_dynamic_ntk': False, 'use_logn_attn': False, 'rotary_emb_base': 1000000.0, 'size_per_head': 128}
2024-10-15 09:58:08 | INFO | stdout | save dimodel to  /root/dashinfer_models/Qwen2-7B_cpu_single_float16.dimodel
2024-10-15 09:58:08 | INFO | stdout | save ditensors to  /root/dashinfer_models/Qwen2-7B_cpu_single_float16.ditensors
2024-10-15 09:58:16 | INFO | stdout | parse weight time:  8.026057004928589
2024-10-15 09:58:16 | INFO | stdout | current allspark version major[ 1 ] minor[ 2 ] patch[ 1 ] commit =  5ceddf95
2024-10-15 09:58:16 | INFO | stdout | calculate md5 of dimodel =  b51d97a3e0a163de5f6123f7ad0fd77e
2024-10-15 09:58:16 | INFO | stdout | torch build meta: 	 model_name 	:  Qwen2-7B_cpu_single_float16
2024-10-15 09:58:16 | INFO | stdout | torch build meta: 	 model_type 	:  Qwen_v20
2024-10-15 09:58:16 | INFO | stdout | torch build meta: 	 save_dir 	:  /root/dashinfer_models/
2024-10-15 09:58:16 | INFO | stdout | torch build meta: 	 multinode_mode 	:  False
2024-10-15 09:58:16 | INFO | stdout | torch build meta: 	 data_type 	:  float16
2024-10-15 09:58:16 | INFO | stdout | torch build meta: 	 do_dynamic_quantize_convert 	:  False
2024-10-15 09:58:16 | INFO | stdout | torch build meta: 	 use_dynamic_ntk 	:  False
2024-10-15 09:58:16 | INFO | stdout | torch build meta: 	 use_logn_attn 	:  False
2024-10-15 09:58:16 | INFO | stdout | torch build meta: 	 model_sequence_length 	:  2048
2024-10-15 09:58:16 | INFO | stdout | torch build meta: 	 seqlen_extrapolation 	:  1.0
2024-10-15 09:58:16 | INFO | stdout | torch build meta: 	 rotary_base 	:  1000000.0
2024-10-15 09:58:16 | INFO | stdout | serialize_model_from_torch: save model = true, time :  8.104979991912842
2024-10-15 09:58:16 | INFO | stdout | convert model from HF finished, build time is 8.105656862258911 seconds
I20241015 09:58:16.746803    18 as_engine.cpp:366] Detect avx512f supported, switch Prefill mode to flash
I20241015 09:58:16.746842    18 as_engine.cpp:384] Build model use following config:
AsModelConfig :
	model_name: Qwen2-7B_cpu_single_float16
	model_path: /root/dashinfer_models/Qwen2-7B_cpu_single_float16.dimodel
	weights_path: /root/dashinfer_models/Qwen2-7B_cpu_single_float16.ditensors
	compute_unit: CPU:0
	num_threads: 12
	matmul_precision: highest
	prefill_mode: AsPrefillFlashV2
	cache_mode: AsCacheDefault
	engine_max_length = 2048
	engine_max_batch = 8

I20241015 09:58:16.746910    18 as_engine.cpp:388] Load model from : /root/dashinfer_models/Qwen2-7B_cpu_single_float16.dimodel
I20241015 09:58:16.747004    18 as_engine.cpp:300] SetDeviceIds: DeviceIDs.size() 1
I20241015 09:58:16.747017    18 as_engine.cpp:307] Start create 1 Device: CPU workers.
I20241015 09:58:16.747486   215 cpu_context.cpp:114] CPUContext::InitMCCL() rank: 0 nRanks: 1
I20241015 09:58:16.827616    18 as_param_check.hpp:342] AsParamGuard check level = CHECKER_NORMAL. Engine version = 1.2 . Weight version = 1.2 . 
I20241015 09:58:16.829321    18 as_engine.cpp:511] Start BuildModel
I20241015 09:58:16.829511   216 as_engine.cpp:521] Start Build model for rank: 0
I20241015 09:58:16.829562   216 weight_manager.cpp:131] Start Loading weight for model RankInfo[0/1]
I20241015 09:58:16.829576   216 weight_manager.cpp:52] Start open model file /root/dashinfer_models/Qwen2-7B_cpu_single_float16.ditensors
I20241015 09:58:16.829613   216 weight_manager.cpp:59] Open model file success. 
I20241015 09:58:16.832871   216 weight_manager.cpp:107] Weight file header parse success...291 weight tensors are going to load. 
I20241015 09:58:21.656690   216 weight_manager.cpp:257] finish weight load for model RankInfo[0/1] time  spend: 4.827 seconds.
I20241015 09:58:21.659478   216 as_engine.cpp:525] Finish Build model for rank: 0
2024-10-15 09:58:21 | INFO | stdout | build model over, build time is 5.124737977981567
I20241015 09:58:21.661041    18 as_engine.cpp:672] StartModel: warming up...
I20241015 09:58:21.661065   217 as_engine.cpp:1612] | AllsparkStat | Req: Running: 0 Pending: 0 	 Prompt: 0 T/s  Gen: 0 T/s 
2024-10-15 10:02:14 | INFO | stdout | INFO:     127.0.0.1:36730 - "POST /list_models HTTP/1.1" 200 OK
2024-10-15 10:02:14 | INFO | stdout | INFO:     127.0.0.1:37592 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
2024-10-15 10:02:15 | INFO | stdout | INFO:     127.0.0.1:36736 - "POST /list_models HTTP/1.1" 200 OK
2024-10-15 10:02:15 | INFO | stdout | INFO:     127.0.0.1:37592 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
2024-10-15 10:02:16 | INFO | stdout | INFO:     127.0.0.1:36744 - "POST /list_models HTTP/1.1" 200 OK
2024-10-15 10:02:16 | INFO | stdout | INFO:     127.0.0.1:37592 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
2024-10-15 10:02:16 | INFO | stdout | INFO:     127.0.0.1:36746 - "POST /list_models HTTP/1.1" 200 OK
2024-10-15 10:02:16 | INFO | stdout | INFO:     127.0.0.1:37592 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
2024-10-15 10:02:24 | INFO | stdout | INFO:     127.0.0.1:60596 - "POST /list_models HTTP/1.1" 200 OK
2024-10-15 10:02:24 | INFO | stdout | INFO:     127.0.0.1:46342 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
2024-10-15 10:02:25 | INFO | stdout | INFO:     127.0.0.1:60608 - "POST /list_models HTTP/1.1" 200 OK
2024-10-15 10:02:25 | INFO | stdout | INFO:     127.0.0.1:46342 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
2024-10-15 10:02:26 | INFO | stdout | INFO:     127.0.0.1:60624 - "POST /list_models HTTP/1.1" 200 OK
2024-10-15 10:02:26 | INFO | stdout | INFO:     127.0.0.1:46342 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
2024-10-15 10:02:26 | INFO | stdout | INFO:     127.0.0.1:60632 - "POST /list_models HTTP/1.1" 200 OK
2024-10-15 10:02:26 | INFO | stdout | INFO:     127.0.0.1:46342 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request

@laiwenzh
Copy link
Collaborator

你的模型名称是Qwen2.5-3B-Instruct,所以要把"model_name": "Qwen2-7B",这里的model_name改成wen2.5-3B-Instruct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants