0. 环境配置

本篇作为 nano-vllm 学习讲解第 0 节，首先当然得把代码跑起来，不然纯粹白忙活。

我们选择使用 uv 作为环境搭建，uv 的安装就不多费口舌了，跟随官网文档很快搞定。

uv 配置搭建环境

1	git clone git@github.com:GeeeekExplorer/nano-vllm.git

先把代码 git clone 下来，uv 创建虚拟环境

1	uv venv --python 3.10

然后先不要着急 uv sync ，注意下 pyproject.toml 文件，里面依赖了 flash-attn ，建议各位决定并安装好自己的 torch + cuda 版本后，去这个 prebuild wheels 里找到对应下载你想要的版本，然后自行安装，不然版本报错挺烦人的。

比如，我这里打算用 torch 2.10 + cuda 12.8，那么，先把 toml 中 flash-attn 注释掉，并在文件末尾加入相关的依赖，最终文件如下：

[build-system]
requires = ["setuptools>=61"]
build-backend = "setuptools.build_meta"

[project]
name = "nano-vllm"
version = "0.2.0"
authors = [{ name = "Xingkai Yu" }]
license = "MIT"
license-files = ["LICENSE"]
readme = "README.md"
description = "a lightweight vLLM implementation built from scratch"
requires-python = ">=3.10,<3.13"
dependencies = [
    "torch>=2.4.0",
    "triton>=3.0.0",
    "transformers>=4.51.0",
    # "flash-attn", use pre-built flash-attn
    "xxhash",
]

[project.urls]
Homepage="https://github.com/GeeeekExplorer/nano-vllm"

[tool.setuptools.packages.find]
where = ["."]
include = ["nanovllm*"]

# 下面是新加入的对应 torch 2.10 + cu128 的索引
[tool.uv.sources]
torch = { index = "pytorch-cu128" }

[[tool.uv.index]]
name = "pytorch-cu128"
url = "https://download.pytorch.org/whl/cu128"
explicit = true

然后，uv sync 安装依赖，再在上面 prebuild wheels 中找到对应的 flash-attn 版本，这里是我们使用的是 2.8.3：

1	uv pip install https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.7.16/flash_attn-2.8.3+cu128torch2.10-cp310-cp310-linux_x86_64.whl

如此一来，整个环境就搭建完成了，接下来我们使用 Qwen3-0.6B 做一个简单的测试，权重下载方法：

1
2
3

uv pip install huggingface_hub
# 执行下面命令前激活 uv 的虚拟环境，local-dir 修改成你自己想放的位置
hf download Qwen/Qwen3-0.6B --local-dir ~/Project/vllm/huggingface/Qwen3-0.6B/

简单测试

打开 example.py ，把 path 修改成上面你下载的路径，

1 2	def main(): path = os.path.expanduser("~/Project/vllm/huggingface/Qwen3-0.6B/")

~~然后，python main.py 就可以了。~~

还没那么简单，哈哈，由于 transformers 库 api 更新的原因，如果你直接 python example.py 是会看到下面报错的：

1
2
3

[rank0]:   File "Project/vllm/nano-vllm/nanovllm/models/qwen3.py", line 54, in __init__
[rank0]:     self.rotary_emb = get_rope(
[rank0]: TypeError: unhashable type: 'dict'

想解决很简单，找到 /nanovllm/config.py ，在 AutoConfig 后把 rope_scaling 设置成 None ，即可，具体代码如下：

import os
from dataclasses import dataclass
from transformers import AutoConfig

@dataclass
class Config:
		...
    def __post_init__(self):
        assert os.path.isdir(self.model)
        assert self.kvcache_block_size % 256 == 0
        assert 1 <= self.tensor_parallel_size <= 8
        self.hf_config = AutoConfig.from_pretrained(self.model)
        # === 修改点！===
        self.hf_config.rope_scaling = None  # disable RoPE scaling in HuggingFace config
        self.max_model_len = min(self.max_model_len, self.hf_config.max_position_embeddings)
        assert self.max_num_batched_tokens >= self.max_model_len

这下再， python main.p 就真的 Okay 了。

我也要学吗？

【nano-vllm 学习】00 - 基础环境配置

0. 环境配置

uv 配置搭建环境

简单测试

说些什么吧！