二、vLLM-Omni + Wan2.1-T2V-1.3B 部署文档

张开发
2026/4/10 4:04:31 15 分钟阅读
二、vLLM-Omni + Wan2.1-T2V-1.3B 部署文档
vLLM-Omni Wan2.1-T2V-1.3B 部署文档目标将原有 SD1.5 ControlNet-Canny 图像生成服务迁移至 vLLM-Omni Wan2.1-T2V-1.3B 模型在华为昇腾 910B4 NPU 上完成部署和服务化。日期2026-04-02部署环境昇腾 910B4单卡 / 多卡CANN 8.5.1一、背景与目标项目原有方案 (MindIE-SD)新方案 (vLLM-Omni Wan2.1)模型架构SD1.5 UNet ControlNetWan2.1 DiT (1.3B T2V)生成类型图生图ControlNet 控制文本转视频T2V并发机制Ray Worker 池进程级Continuous Batching Paged Attention控制方式ControlNet-Canny暂无待适配框架FastAPI 手动封装OpenAI 兼容 API核心诉求利用 vLLM-Omni 的 continuous batching 和 paged attention 提升多请求并发能力。二、硬件与容器环境2.1 基础镜像vLLM-Omni不提供预构建完整镜像需要基于quay.io/ascend/vllm-ascend:v0.17.0rc1从源码 Build。# 拉取基础镜像dockerpull quay.io/ascend/vllm-ascend:v0.17.0rc12.2 拉起容器dockerrun-itd\--namevllm-omni-wan21\--privileged\--cap-addSYS_PTRACE\--device/dev/davinci0\--device/dev/davinci1\--device/dev/davinci2\--device/dev/davinci3\--device/dev/davinci_manager\--device/dev/devmm_svm\--device/dev/hisi_hdc\-v/usr/local/dcmi:/usr/local/dcmi\-v/usr/local/bin/npu-smi:/usr/local/bin/npu-smi\-v/usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/\-v/usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info\-v/etc/ascend_install.info:/etc/ascend_install.info\-v/data:/data\--shm-size1g\--memory500g\--cpus64\--nethost\quay.io/ascend/vllm-ascend:v0.17.0rc1\bash2.3 进入容器dockerexec-itvllm-omni-wan21bash三、依赖安装源码方式说明目标机器网络不通 GitHub/Quay需要在外网机器下载源码包后传入。3.1 源码包准备外网机器执行# 在能通外网的机器上下载源码包gitclone--depth1https://github.com/vllm-project/vllm.git /tmp/vllmgitclone--depth1https://github.com/vllm-project/vllm-ascend.git /tmp/vllm-ascendgitclone--depth1https://github.com/vllm-project/vllm-omni.git /tmp/vllm-omni# 打包传入cd/tmptar-czvfvllm-packages.tar.gz vllm vllm-ascend vllm-omniscpvllm-packages.tar.gz root192.168.83.100:/data/3.2 克隆源码到容器内# 如果容器内能通 GitHub开了 VPNcd/data/cjh/ominigitclone https://github.com/vllm-project/vllm-ascend.gitgitclone https://github.com/vllm-project/vllm-omni.git# 如果不通从 zip 包解压离线情况cd/data/cjh/ominiunzipvllm-ascend-main.zipunzipvllm-omni-main.zip3.3 安装 vllm-ascend问题 1catlass 子模块缺失build 脚本需要catlass头文件但网络不通无法git submodule update。解决从 CANN 安装目录创建软链接。cd/data/cjh/omini/vllm-ascend-main/csrc/third_party# 删掉错误链接如果有rm-rfcatlass# 正确链接指向包含 include/ 的 catlass 目录ln-sf/usr/local/Ascend/cann-8.5.1/opp/built-in/op_impl/ai_core/tbe/impl/ops_legacy/ascendc/common/catlass\catlass# 由于 .h 文件在 catlass/ 根下需要建 include 链接cdcatlassln-sf.include# 验证lsinclude/# 应看到 catlass.hpp 等文件安装cd/data/cjh/omini/vllm-ascend-main pipinstall-e.--no-build-isolation\--extra-index https://mirrors.huaweicloud.com/ascend/repos/pypi\--extra-index https://download.pytorch.org/whl/cpu/3.4 安装 vllm 0.18.0cd/vllm-workspace/vllmgitcheckout v0.18.0 pipinstall-e.--no-build-isolation\--extra-index https://mirrors.huaweicloud.com/ascend/repos/pypi\--extra-index https://download.pytorch.org/whl/cpu/3.5 安装 vllm-omni问题 2setuptools_scm 版本检测失败pip install 时 setuptools_scm fallback 版本devnpu不合法PEP 440。解决设置版本环境变量 修改 pyproject.toml。cd/data/cjh/omini/vllm-omni-main# 方案A设置环境变量exportVLLM_OMNI_VERSION_OVERRIDE0.18.0exportSETUPTOOLS_SCM_PRETEND_VERSION0.18.0exportSETUPTOOLS_SCM_PRETEND_VERSION_FOR_VLLM_OMNI0.18.0 pipinstall-e.--no-build-isolation\--extra-index https://mirrors.huaweicloud.com/ascend/repos/pypi\--extra-index https://download.pytorch.org/whl/cpu/3.6 安装其他依赖# aenum 缺失pipinstallaenum# 安装 requirementspipinstall-r/data/cjh/omini/vllm-omni-main/requirements/common.txt pipinstall-r/data/cjh/omini/vllm-omni-main/requirements/npu.txt# orjson可选提升 API 速度pipinstallorjson3.7 验证安装python-cimport torch; print(NPU:, torch.npu.is_available())# 应输出NPU: Truepython-cfrom vllm_omni.diffusion.models.wan2_2 import Wan22Pipeline; print(OK)四、模型下载# 方式1VPN HuggingFace推荐gitlfsinstallgitclone https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B-Diffusers\/data/models/Wan2.1-T2V-1.3B-Diffusers# 验证格式ls/data/models/Wan2.1-T2V-1.3B-Diffusers/# 应包含model_index.json, config.json, transformer/, vae/, tokenizer/ 等五、服务启动5.1 启动命令exportASCEND_RT_VISIBLE_DEVICES0# 指定使用的 NPU 卡exportVLLM_WORKER_MULTIPROC_METHODspawnsource/usr/local/Ascend/ascend-toolkit/set_env.shsource/usr/local/Ascend/nnal/atb/set_env.shcd/data/cjh/omini/vllm-omni-main/examples/online_serving/text_to_videoMODEL/data/models/Wan2.1-T2V-1.3B-Diffusers\PORT8023\BOUNDARY_RATIO0.875\FLOW_SHIFT5.0\bashrun_server.sh5.2 关键参数说明参数默认值说明ASCEND_RT_VISIBLE_DEVICES-指定 NPU 卡号如0或2PORT8098服务端口BOUNDARY_RATIO0.875Wan2.1 低/高噪声阶段分割比FLOW_SHIFT5.0调度器流偏移--dtypefloat16数据精度5.3 服务验证curl-shttp://localhost:8023/health# 应返回 200 OK六、API 调用6.1 同步接口测试/压测用curl-XPOST http://localhost:8023/v1/videos/sync\-FpromptA futuristic city at sunset\-Fnum_frames33\-Ffps16\-Fnum_inference_steps40\-Fguidance_scale4.0\-Fseed42\-ooutput.mp46.2 异步接口生产推荐# 创建任务curl-s-XPOST http://localhost:8023/v1/videos\-FpromptA cat running in the park\-Fnum_frames33\-Ffps16\-Fnum_inference_steps40\-Fguidance_scale4.0\-Fseed100# 返回{id: video_xxx, status: queued, ...}# 轮询状态curl-shttp://localhost:8023/v1/videos/{video_id}# status: queued → processing → completed# 下载视频curl-Lhttp://localhost:8023/v1/videos/{video_id}/content-oresult.mp46.3 OpenAI Python SDKfromopenaiimportOpenAI clientOpenAI(base_urlhttp://localhost:8023/v1,api_keynone)responseclient.chat.completions.create(modelWan-AI/Wan2.1-T2V-1.3B-Diffusers,messages[{role:user,content:A futuristic city at sunset}],extra_body{num_frames:33,fps:16,num_inference_steps:40,guidance_scale:4.0,seed:42,},)七、遇到的问题与解决方案汇总问题 1Dockerfile Build 时 setuptools_scm 版本非法现象packaging.version.InvalidVersion: Invalid version: devcpu原因COPY . .复制时不带.git目录setuptools_scm fallback 到devcpu不是合法 PEP 440 版本。解决设置环境变量绕过版本检测exportSETUPTOOLS_SCM_PRETEND_VERSION_FOR_VLLM_OMNI0.18.0或修改 Dockerfile在 pip install 前 patch pyproject.tomlRUN sed -i s/dynamic \[version\]/version 0.18.0/ pyproject.toml \ sed -i /^\[tool\.setuptools_scm\]/,/^$/d pyproject.toml问题 2catlass 子模块 git fetch 失败现象Error running build_aclnn.sh: Command [bash, csrc/build_aclnn.sh, ...] returned non-zero exit status 1. fetch failed原因网络不通无法访问 GitLab/ GitHub 获取 catlass 子模块。解决从 CANN 安装目录已有的 catlass 头文件创建软链接cdvllm-ascend-main/csrc/third_partyln-sf/usr/local/Ascend/cann-8.5.1/opp/built-in/op_impl/ai_core/tbe/impl/ops_legacy/ascendc/common/catlass catlasscdcatlassln-sf.include问题 3transformers 版本冲突现象transformers在 4.x → 5.x 之间来回冲突4.48.0 缺Gemma3Config5.4.0 导致 vllm 兼容问题。解决使用--no-deps安装 transformers不影响其他包pipinstalltransformers5.4.0 --extra-index... --no-deps问题 4vllm serve vs vllm-omni serve 命令混淆现象vllm: error: unrecognized arguments: --omni vllm-omni serve 走的是 vllm 主包逻辑不认 --omni原因vllm serve是 vLLM 主包命令vllm-omni serve是正确的 vLLM-Omni 入口。解决使用正确的命令vllm-omni serve /data/models/Wan2.1-T2V-1.3B-Diffusers\--omni--port8023--dtypefloat16问题 5ModelScope 下载的模型缺 model_index.json现象FileNotFoundError: model_index.json not found原因ModelScope 下载的模型是.mv等私有格式不是标准 HuggingFace diffusers 格式。vLLM-Omni 只支持 diffusers 格式。解决使用 HuggingFace 下载 diffusers 格式gitclone https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B-Diffusers /data/models/Wan2.1-T2V-1.3B-Diffusers问题 6transformers 不认识t2vmodel_type现象Value error, The checkpoint you are trying to load has model type t2v but Transformers does not recognize this architecture.原因Wan2.1 的config.json中model_type: t2v不是 HuggingFace Transformers 认识的标准架构名。解决修改模型 config.jsonsed-is/model_type: t2v/model_type: Wan2VideoTransformer/\/data/models/wan2.1-T2V-1.3B/config.json此问题在 diffusers 格式模型中已不需要因为 vllm-omni 不走 transformers 的 model type 校验八、性能数据指标数值模型加载耗时~15.9 秒33帧 / 40步推理耗时~92 秒约 2.2 秒/步显存峰值25.96 GB reserved22.30 GB allocated显存占用率~85%1.3B 模型单卡单请求延迟约 93 秒33帧。vLLM-Omni 的并发优势体现在多请求 continuous batching多个视频请求会复用 GPU 计算资源。九、后续计划9.1 ControlNet-Canny 适配待开发Wan2.1 原生不具备 ControlNet需二次开发在 vLLM-Omni 的 DiT block 上通过 forward hook 注入 ControlNet 条件代码框架已在vllm_omni_controlnet/目录核心问题SD-ControlNet 权重在 Wan2.1 DiT 上需要蒸馏/微调才能生效9.2 多卡并行当前单卡推理。如需扩多卡# Tensor Parallel多卡切分vllm-omni serve... --tensor-parallel-size4# CFG-Parallel推理步并行适合高并发vllm-omni serve... --cfg-parallel-size49.3 并发压测服务化后建议使用locust或ab进行并发压测验证 continuous batching 带来的 QPS 提升。十、文件清单C:\Users\OMEN\Desktop\192.168.83.100\ ├── vllm_omni_wan21_migration.md # 迁移规范主文档 ├── vllm_omni_controlnet\ # ControlNet 适配代码待完善 │ ├── controlnet_canny\ │ │ ├── canny.py │ │ ├── adapter.py │ │ ├── inject.py │ │ └── pipeline_patch.py │ ├── service.py │ ├── deploy.sh │ └── Dockerfile ├── stable_diffusion_controlnet迁移流程.docx # 原参考文档 └── WAN2.1_VLLMOMNI_DEPLOYMENT.md # 本文档

更多文章