access vllm

2024-04-11 22:00:07 +08:00
parent 02b6f26b05
commit 2406022c2a
3 changed files with 76 additions and 2 deletions
--- a/docs/use_vllm.md
+++ b/docs/use_vllm.md
@@ -0,0 +1,46 @@
+# 使用VLLM
+
+
+## 1. 首先启动 VLLM，自行选择模型
+
+```
+python -m vllm.entrypoints.openai.api_server --model /home/hmp/llm/cache/Qwen1___5-32B-Chat --tensor-parallel-size 2 --dtype=half
+```
+
+这里使用了存储在 `/home/hmp/llm/cache/Qwen1___5-32B-Chat` 的本地模型，可以根据自己的需求更改。
+
+## 2. 测试 VLLM
+
+```
+curl http://localhost:8000/v1/chat/completions \
+-H "Content-Type: application/json" \
+-d '{
+  "model": "/home/hmp/llm/cache/Qwen1___5-32B-Chat",
+  "messages": [
+  {"role": "system", "content": "You are a helpful assistant."},
+  {"role": "user", "content": "怎么实现一个去中心化的控制器?"}
+  ]
+}'
+```
+
+## 3. 配置本项目
+
+```
+API_KEY = "sk-123456789xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx123456789"
+LLM_MODEL = "vllm-/home/hmp/llm/cache/Qwen1___5-32B-Chat(max_token=4096)"
+API_URL_REDIRECT = {"https://api.openai.com/v1/chat/completions": "http://localhost:8000/v1/chat/completions"}
+```
+
+```
+"vllm-/home/hmp/llm/cache/Qwen1___5-32B-Chat(max_token=4096)"
+其中
+  "vllm-"                                     是前缀（必要）
+  "/home/hmp/llm/cache/Qwen1___5-32B-Chat"    是模型名（必要）
+  "(max_token=6666)"                          是配置（非必要）
+```
+
+## 4. 启动！
+
+```
+python main.py
+```