What NVIDIA driver version do I need for vLLM 0.4.x with CUDA 11.8?

For vLLM 0.4.x with CUDA 11.8, you need NVIDIA driver version >=520.61.05 or higher. Run nvidia-smi to check your current driver version.

Which Python versions are supported by vLLM 0.4.x?

vLLM 0.4.x supports Python versions: 3.8, 3.9, 3.10, 3.11. We recommend using Python 3.10 or 3.11 for best compatibility with vLLM.

How do I run vLLM with GPU support in Docker?

After building your image, run: docker run --gpus all -p 8000:8000 your-image python -m vllm.entrypoints.openai.api_server --model your-model-name. Use --gpu-memory-utilization to control memory usage.

Recommended Driver >=520.61.05

vLLM 0.4.x + CUDA 11.8

High-throughput LLM serving engine with production-ready Docker configuration

Configuration Summary

Framework

vLLM 0.4.x

CUDA Version

11.8

Python Support

3.8, 3.9, 3.10, 3.11

Min Driver

>=520.61.05

Note: 早期稳定版本，兼容性好

Install Command

pip install vllm==0.4.3

What's in vLLM 0.4.x

Early stable vLLM release
Foundation PagedAttention system
Good backward compatibility
Established ecosystem support

Performance: Older but stable, recommend upgrading for better performance

Best For

Use Cases

Legacy system compatibility
Maximum backward compatibility needs
Environments locked to v0.4.x

CUDA 11.8 Advantages

General GPU workloads

Generate Dockerfile

Configuration

Deployment Target

Local GPU or CPU environment

Framework

Version

CUDA Version

早期稳定版本，兼容性好

Python Version

Requires NVIDIA Driver >=520.61.05

Dockerfile

1# syntax=docker/dockerfile:1
2# ^ Required for BuildKit cache mounts and advanced features
3 
4# Generated by DockerFit (https://tools.eastondev.com/docker)
5# VLLM 0.4.x + CUDA 11.8 | Python 3.11
6# Multi-stage build for optimized image size
7 
8# ==============================================================================
9# Stage 1: Builder - Install dependencies and compile
10# ==============================================================================
11FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04 AS builder
12 
13# Build arguments
14ARG DEBIAN_FRONTEND=noninteractive
15 
16# Environment variables
17ENV PYTHONUNBUFFERED=1
18ENV PYTHONDONTWRITEBYTECODE=1
19ENV TORCH_CUDA_ARCH_LIST="7.5;8.0;8.6"
20 
21# Install Python 3.11 from deadsnakes PPA (Ubuntu 22.04)
22RUN apt-get update && apt-get install -y --no-install-recommends \
23    software-properties-common \
24    && add-apt-repository -y ppa:deadsnakes/ppa \
25    && apt-get update && apt-get install -y --no-install-recommends \
26    python3.11 \
27    python3.11-venv \
28    python3.11-dev \
29    build-essential \
30    git \
31    ninja-build
32    && rm -rf /var/lib/apt/lists/*
33 
34# Create virtual environment
35ENV VIRTUAL_ENV=/opt/venv
36RUN python3.11 -m venv $VIRTUAL_ENV
37ENV PATH="$VIRTUAL_ENV/bin:$PATH"
38 
39# Upgrade pip
40RUN pip install --no-cache-dir --upgrade pip setuptools wheel
41 
42# Install vLLM with BuildKit cache
43# Pre-install packaging for potential source builds
44RUN --mount=type=cache,target=/root/.cache/pip \
45    pip install packaging && \
46    pip install vllm==0.4.3
47 
48# Install project dependencies
49COPY requirements.txt .
50RUN --mount=type=cache,target=/root/.cache/pip \
51    pip install -r requirements.txt
52 
53# ==============================================================================
54# Stage 2: Runtime - Minimal production image
55# ==============================================================================
56FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04 AS runtime
57 
58# Labels
59LABEL maintainer="Generated by DockerFit"
60LABEL version="0.4.x"
61LABEL description="VLLM 0.4.x + CUDA 11.8"
62 
63# Environment variables
64ENV PYTHONUNBUFFERED=1
65ENV PYTHONDONTWRITEBYTECODE=1
66ENV NVIDIA_VISIBLE_DEVICES=all
67ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
68 
69# Install Python 3.11 runtime from deadsnakes PPA (Ubuntu 22.04)
70RUN apt-get update && apt-get install -y --no-install-recommends \
71    software-properties-common \
72    && add-apt-repository -y ppa:deadsnakes/ppa \
73    && apt-get update && apt-get install -y --no-install-recommends \
74    python3.11 \
75    libgomp1 \
76    ninja-build
77    && apt-get remove -y software-properties-common \
78    && apt-get autoremove -y \
79    && rm -rf /var/lib/apt/lists/*
80 
81# Create non-root user for security
82ARG USERNAME=appuser
83ARG USER_UID=1000
84ARG USER_GID=$USER_UID
85RUN groupadd --gid $USER_GID $USERNAME \
86    && useradd --uid $USER_UID --gid $USER_GID -m $USERNAME
87 
88# Copy virtual environment from builder
89COPY --from=builder --chown=$USERNAME:$USERNAME /opt/venv /opt/venv
90ENV VIRTUAL_ENV=/opt/venv
91ENV PATH="$VIRTUAL_ENV/bin:$PATH"
92 
93# Set working directory
94WORKDIR /app
95 
96# Copy application code
97COPY --chown=$USERNAME:$USERNAME . .
98 
99# Switch to non-root user
100USER $USERNAME
101 
102# Expose port
103EXPOSE 8000
104 
105# Default command
106CMD ["python", "main.py"]

🚀 Recommended

High-Performance GPU Cloud

Deploy your Docker containers with powerful NVIDIA GPUs. A100/H100 available, 32+ global locations.

NVIDIA A100/H100 GPU instances
Hourly billing, starting at $0.004/h
32+ global data centers
One-click container & bare metal deployment

🎁 Deploy Now

Frequently Asked Questions

What GPU memory do I need for vLLM?

GPU memory requirements depend on your model size:

• 7B models: 16GB+ (T4, A10G)
• 13B models: 24GB+ (L4, A10G)
• 70B models: 80GB+ (A100, H100)

vLLM supports tensor parallelism for multi-GPU deployment.

How do I serve a model with vLLM?

Start a vLLM server with OpenAI-compatible API:


python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-3.2-3B-Instruct --port 8000

The server provides a drop-in replacement for OpenAI API.

What is the Blackwell configuration for?

The Blackwell configuration is optimized for NVIDIA's latest B200 and GB200 GPUs:

• Requires CUDA 12.8+
• Uses PyTorch nightly builds
• Supports FP4 precision for maximum throughput