Recommended
Driver >=560.35.05
vLLM Latest + CUDA 12.6
High-throughput LLM serving engine with production-ready Docker configuration
Configuration Summary
Framework
vLLM Latest
CUDA Version
12.6
Python Support
3.10, 3.11, 3.12
Min Driver
>=560.35.05
Note: 2025年推荐配置,与PyTorch 2.9生态一致
Install Command
pip install vllm What's in vLLM Latest
- PagedAttention for efficient KV cache management
- Continuous batching for maximum throughput
- Tensor parallelism for multi-GPU inference
- OpenAI-compatible API server
- Support for 50+ model architectures
Performance: Up to 24x higher throughput vs HuggingFace Transformers
Best For
Use Cases
- Production LLM API services
- High-throughput chatbot backends
- Batch inference pipelines
- Multi-model serving with minimal VRAM
CUDA 12.6 Advantages
- Latest stable CUDA for production
- H100 and Ada Lovelace optimization
- Balance of features and stability
Generate Dockerfile
Configuration
Local GPU or CPU environment
2025年推荐配置,与PyTorch 2.9生态一致
Requires NVIDIA Driver >=560.35.05
Dockerfile
1# syntax=docker/dockerfile:12# ^ Required for BuildKit cache mounts and advanced features34# Generated by DockerFit (https://tools.eastondev.com/docker)5# VLLM latest + CUDA 12.6 | Python 3.116# Multi-stage build for optimized image size78# ==============================================================================9# Stage 1: Builder - Install dependencies and compile10# ==============================================================================11FROM nvidia/cuda:12.6.3-cudnn-devel-ubuntu22.04 AS builder1213# Build arguments14ARG DEBIAN_FRONTEND=noninteractive1516# Environment variables17ENV PYTHONUNBUFFERED=118ENV PYTHONDONTWRITEBYTECODE=119ENV TORCH_CUDA_ARCH_LIST="8.0;8.6;8.9;9.0"2021# Install Python 3.11 from deadsnakes PPA (Ubuntu 22.04)22RUN apt-get update && apt-get install -y --no-install-recommends \23 software-properties-common \24 && add-apt-repository -y ppa:deadsnakes/ppa \25 && apt-get update && apt-get install -y --no-install-recommends \26 python3.11 \27 python3.11-venv \28 python3.11-dev \29 build-essential \30 git \31 ninja-build32 && rm -rf /var/lib/apt/lists/*3334# Create virtual environment35ENV VIRTUAL_ENV=/opt/venv36RUN python3.11 -m venv $VIRTUAL_ENV37ENV PATH="$VIRTUAL_ENV/bin:$PATH"3839# Upgrade pip40RUN pip install --no-cache-dir --upgrade pip setuptools wheel4142# Install vLLM with BuildKit cache43# Pre-install packaging for potential source builds44RUN --mount=type=cache,target=/root/.cache/pip \45 pip install packaging && \46 pip install vllm4748# Install project dependencies49COPY requirements.txt .50RUN --mount=type=cache,target=/root/.cache/pip \51 pip install -r requirements.txt5253# ==============================================================================54# Stage 2: Runtime - Minimal production image55# ==============================================================================56FROM nvidia/cuda:12.6.3-cudnn-runtime-ubuntu22.04 AS runtime5758# Labels59LABEL maintainer="Generated by DockerFit"60LABEL version="latest"61LABEL description="VLLM latest + CUDA 12.6"6263# Environment variables64ENV PYTHONUNBUFFERED=165ENV PYTHONDONTWRITEBYTECODE=166ENV NVIDIA_VISIBLE_DEVICES=all67ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility6869# Install Python 3.11 runtime from deadsnakes PPA (Ubuntu 22.04)70RUN apt-get update && apt-get install -y --no-install-recommends \71 software-properties-common \72 && add-apt-repository -y ppa:deadsnakes/ppa \73 && apt-get update && apt-get install -y --no-install-recommends \74 python3.11 \75 libgomp1 \76 ninja-build77 && apt-get remove -y software-properties-common \78 && apt-get autoremove -y \79 && rm -rf /var/lib/apt/lists/*8081# Create non-root user for security82ARG USERNAME=appuser83ARG USER_UID=100084ARG USER_GID=$USER_UID85RUN groupadd --gid $USER_GID $USERNAME \86 && useradd --uid $USER_UID --gid $USER_GID -m $USERNAME8788# Copy virtual environment from builder89COPY --from=builder --chown=$USERNAME:$USERNAME /opt/venv /opt/venv90ENV VIRTUAL_ENV=/opt/venv91ENV PATH="$VIRTUAL_ENV/bin:$PATH"9293# Set working directory94WORKDIR /app9596# Copy application code97COPY --chown=$USERNAME:$USERNAME . .9899# Switch to non-root user100USER $USERNAME101102# Expose port103EXPOSE 8000104105# Default command106CMD ["python", "main.py"]
🚀 Recommended
High-Performance GPU Cloud
Deploy your Docker containers with powerful NVIDIA GPUs. A100/H100 available, 32+ global locations.
- NVIDIA A100/H100 GPU instances
- Hourly billing, starting at $0.004/h
- 32+ global data centers
- One-click container & bare metal deployment
Frequently Asked Questions
What GPU memory do I need for vLLM?
GPU memory requirements depend on your model size:
- • 7B models: 16GB+ (T4, A10G)
- • 13B models: 24GB+ (L4, A10G)
- • 70B models: 80GB+ (A100, H100)
vLLM supports tensor parallelism for multi-GPU deployment.
How do I serve a model with vLLM?
Start a vLLM server with OpenAI-compatible API:
python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-3.2-3B-Instruct --port 8000
The server provides a drop-in replacement for OpenAI API.
What is the Blackwell configuration for?
The Blackwell configuration is optimized for NVIDIA's latest B200 and GB200 GPUs:
- • Requires CUDA 12.8+
- • Uses PyTorch nightly builds
- • Supports FP4 precision for maximum throughput