Recommended Driver >=530.30.02

vLLM 0.5.x + CUDA 12.1

High-throughput LLM serving engine with production-ready Docker configuration

Configuration Summary

Framework
vLLM 0.5.x
CUDA Version
12.1
Python Support
3.8, 3.9, 3.10, 3.11
Min Driver
>=530.30.02

Note: 稳定版本,生产验证

Install Command
pip install vllm==0.5.5

What's in vLLM 0.5.x

  • Mature stable release with extensive field testing
  • Core PagedAttention implementation
  • Wide model architecture support
  • Production-grade reliability

Performance: Battle-tested performance, missing newer optimizations

Best For

Use Cases

  • Conservative production deployments
  • Systems requiring proven stability
  • Long-term support environments

CUDA 12.1 Advantages

  • A100 and V100 deployments
  • Cost-effective inference
  • Maximum compatibility with older systems

Generate Dockerfile

Configuration

Local GPU or CPU environment

稳定版本,生产验证

Requires NVIDIA Driver >=530.30.02
Dockerfile
1# syntax=docker/dockerfile:1
2# ^ Required for BuildKit cache mounts and advanced features
3
4# Generated by DockerFit (https://tools.eastondev.com/docker)
5# VLLM 0.5.x + CUDA 12.1 | Python 3.11
6# Multi-stage build for optimized image size
7
8# ==============================================================================
9# Stage 1: Builder - Install dependencies and compile
10# ==============================================================================
11FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04 AS builder
12
13# Build arguments
14ARG DEBIAN_FRONTEND=noninteractive
15
16# Environment variables
17ENV PYTHONUNBUFFERED=1
18ENV PYTHONDONTWRITEBYTECODE=1
19ENV TORCH_CUDA_ARCH_LIST="8.0;8.6;9.0"
20
21# Install Python 3.11 from deadsnakes PPA (Ubuntu 22.04)
22RUN apt-get update && apt-get install -y --no-install-recommends \
23 software-properties-common \
24 && add-apt-repository -y ppa:deadsnakes/ppa \
25 && apt-get update && apt-get install -y --no-install-recommends \
26 python3.11 \
27 python3.11-venv \
28 python3.11-dev \
29 build-essential \
30 git \
31 ninja-build
32 && rm -rf /var/lib/apt/lists/*
33
34# Create virtual environment
35ENV VIRTUAL_ENV=/opt/venv
36RUN python3.11 -m venv $VIRTUAL_ENV
37ENV PATH="$VIRTUAL_ENV/bin:$PATH"
38
39# Upgrade pip
40RUN pip install --no-cache-dir --upgrade pip setuptools wheel
41
42# Install vLLM with BuildKit cache
43# Pre-install packaging for potential source builds
44RUN --mount=type=cache,target=/root/.cache/pip \
45 pip install packaging && \
46 pip install vllm==0.5.5
47
48# Install project dependencies
49COPY requirements.txt .
50RUN --mount=type=cache,target=/root/.cache/pip \
51 pip install -r requirements.txt
52
53# ==============================================================================
54# Stage 2: Runtime - Minimal production image
55# ==============================================================================
56FROM nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04 AS runtime
57
58# Labels
59LABEL maintainer="Generated by DockerFit"
60LABEL version="0.5.x"
61LABEL description="VLLM 0.5.x + CUDA 12.1"
62
63# Environment variables
64ENV PYTHONUNBUFFERED=1
65ENV PYTHONDONTWRITEBYTECODE=1
66ENV NVIDIA_VISIBLE_DEVICES=all
67ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
68
69# Install Python 3.11 runtime from deadsnakes PPA (Ubuntu 22.04)
70RUN apt-get update && apt-get install -y --no-install-recommends \
71 software-properties-common \
72 && add-apt-repository -y ppa:deadsnakes/ppa \
73 && apt-get update && apt-get install -y --no-install-recommends \
74 python3.11 \
75 libgomp1 \
76 ninja-build
77 && apt-get remove -y software-properties-common \
78 && apt-get autoremove -y \
79 && rm -rf /var/lib/apt/lists/*
80
81# Create non-root user for security
82ARG USERNAME=appuser
83ARG USER_UID=1000
84ARG USER_GID=$USER_UID
85RUN groupadd --gid $USER_GID $USERNAME \
86 && useradd --uid $USER_UID --gid $USER_GID -m $USERNAME
87
88# Copy virtual environment from builder
89COPY --from=builder --chown=$USERNAME:$USERNAME /opt/venv /opt/venv
90ENV VIRTUAL_ENV=/opt/venv
91ENV PATH="$VIRTUAL_ENV/bin:$PATH"
92
93# Set working directory
94WORKDIR /app
95
96# Copy application code
97COPY --chown=$USERNAME:$USERNAME . .
98
99# Switch to non-root user
100USER $USERNAME
101
102# Expose port
103EXPOSE 8000
104
105# Default command
106CMD ["python", "main.py"]
🚀 Recommended

High-Performance GPU Cloud

Deploy your Docker containers with powerful NVIDIA GPUs. A100/H100 available, 32+ global locations.

  • NVIDIA A100/H100 GPU instances
  • Hourly billing, starting at $0.004/h
  • 32+ global data centers
  • One-click container & bare metal deployment
🎁 Deploy Now

Frequently Asked Questions

What GPU memory do I need for vLLM?

GPU memory requirements depend on your model size:

  • • 7B models: 16GB+ (T4, A10G)
  • • 13B models: 24GB+ (L4, A10G)
  • • 70B models: 80GB+ (A100, H100)

vLLM supports tensor parallelism for multi-GPU deployment.

How do I serve a model with vLLM?

Start a vLLM server with OpenAI-compatible API:

python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-3.2-3B-Instruct --port 8000

The server provides a drop-in replacement for OpenAI API.

What is the Blackwell configuration for?

The Blackwell configuration is optimized for NVIDIA's latest B200 and GB200 GPUs:

  • • Requires CUDA 12.8+
  • • Uses PyTorch nightly builds
  • • Supports FP4 precision for maximum throughput