模型常见问题
适用于:对话模型(LLM)、向量模型(Embedding)、工具调用(Tools/Function Call)、以及索引/构建向量相关问题。
排查原则:先用最小请求直连模型服务验证(确认模型本身没问题)→ 再排查网关/代理 → 最后排查业务侧参数与超时。
1. 最小化请求验证(必做)
1.1 对话模型:Chat Completions
请求示例:
说明:
BASE_URL请填写平台管理中配置的 Base URL
curl <BASE_URL>/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <API_KEY>" \
-d '{
"model": "<MODEL_NAME>",
"messages": [
{ "role": "user", "content": "Hello" }
],
"stream": false
}'
成功返回示例(关键字段):
{
"id": "chatcmpl-xxx",
"object": "chat.completion",
"created": 1730000000,
"model": "<MODEL_NAME>",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "Hi! How can I help you?" },
"finish_reason": "stop"
}
],
"usage": { "prompt_tokens": 8, "completion_tokens": 9, "total_tokens": 17 }
}
1.2 向量模型:Embeddings
请求示例:
curl <BASE_URL>/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <API_KEY>" \
-d '{
"model": "<EMB_MODEL_NAME>",
"input": "test"
}'
成功返回示例(关键字段):
{
"object": "list",
"model": "<EMB_MODEL_NAME>",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0123, -0.0456, 0.0789]
}
],
"usage": { "prompt_tokens": 2, "total_tokens": 2 }
}
说明:
embedding会是一个很长的浮点数组(维度取决于模型)。