常见错误

修复: PyTorch CUDA 版本不匹配

解决 "CUDA version mismatch" 或 "PyTorch was compiled with CUDA X.X but CUDA Y.Y is installed" 错误

错误信息

RuntimeError: The NVIDIA driver on your system is too old

RuntimeError: CUDA error: no kernel image is available for execution on the device

PyTorch was compiled with CUDA 12.1 but CUDA 11.8 is installed

根本原因

此错误发生在以下版本不匹配时:

  • PyTorch CUDA 版本 - PyTorch 编译时使用的 CUDA 版本
  • 容器 CUDA 版本 - Docker 基础镜像中的 CUDA 运行时
  • 宿主机 NVIDIA 驱动 - 宿主机上的驱动版本

关键点:PyTorch 二进制文件是针对特定 CUDA 版本编译的。使用 pip 默认安装可能会给你一个与环境不匹配的版本。

解决方案

使用 DockerFit 生成经过验证的兼容版本 Dockerfile:

  1. 检查宿主机驱动版本: nvidia-smi
  2. 根据驱动版本选择匹配的 CUDA(参见下表)
  3. 生成 Dockerfile 使用兼容的 PyTorch + CUDA 组合

CUDA 与驱动兼容性

CUDA 版本 最低驱动 (Linux) 推荐 GPU
CUDA 12.4 >=550 H100, A100, L4
CUDA 12.1 >=530 A100, A10G, T4
CUDA 11.8 >=450 T4, V100, RTX 30xx

生成修复的 Dockerfile

在下方选择目标配置以生成经过验证的 Dockerfile:

配置选项

本地 GPU 或 CPU 环境

稳定版本,广泛兼容

需要 NVIDIA 驱动版本 >=530.30.02
Dockerfile
1# syntax=docker/dockerfile:1
2# ^ Required for BuildKit cache mounts and advanced features
3
4# Generated by DockerFit (https://tools.eastondev.com/docker)
5# PYTORCH 2.4.1 + CUDA 12.1 | Python 3.11
6# Multi-stage build for optimized image size
7
8# ==============================================================================
9# Stage 1: Builder - Install dependencies and compile
10# ==============================================================================
11FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04 AS builder
12
13# Build arguments
14ARG DEBIAN_FRONTEND=noninteractive
15
16# Environment variables
17ENV PYTHONUNBUFFERED=1
18ENV PYTHONDONTWRITEBYTECODE=1
19ENV TORCH_CUDA_ARCH_LIST="8.0;8.6;9.0"
20
21# Install Python 3.11 from deadsnakes PPA (Ubuntu 22.04)
22RUN apt-get update && apt-get install -y --no-install-recommends \
23 software-properties-common \
24 && add-apt-repository -y ppa:deadsnakes/ppa \
25 && apt-get update && apt-get install -y --no-install-recommends \
26 python3.11 \
27 python3.11-venv \
28 python3.11-dev \
29 build-essential \
30 git
31 && rm -rf /var/lib/apt/lists/*
32
33# Create virtual environment
34ENV VIRTUAL_ENV=/opt/venv
35RUN python3.11 -m venv $VIRTUAL_ENV
36ENV PATH="$VIRTUAL_ENV/bin:$PATH"
37
38# Upgrade pip
39RUN pip install --no-cache-dir --upgrade pip setuptools wheel
40
41# Install PyTorch with BuildKit cache
42RUN --mount=type=cache,target=/root/.cache/pip \
43 pip install torch torchvision torchaudio \
44 --index-url https://download.pytorch.org/whl/cu121
45
46# Install project dependencies
47COPY requirements.txt .
48RUN --mount=type=cache,target=/root/.cache/pip \
49 pip install -r requirements.txt
50
51# ==============================================================================
52# Stage 2: Runtime - Minimal production image
53# ==============================================================================
54FROM nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04 AS runtime
55
56# Labels
57LABEL maintainer="Generated by DockerFit"
58LABEL version="2.4.1"
59LABEL description="PYTORCH 2.4.1 + CUDA 12.1"
60
61# Environment variables
62ENV PYTHONUNBUFFERED=1
63ENV PYTHONDONTWRITEBYTECODE=1
64ENV NVIDIA_VISIBLE_DEVICES=all
65ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
66
67# Install Python 3.11 runtime from deadsnakes PPA (Ubuntu 22.04)
68RUN apt-get update && apt-get install -y --no-install-recommends \
69 software-properties-common \
70 && add-apt-repository -y ppa:deadsnakes/ppa \
71 && apt-get update && apt-get install -y --no-install-recommends \
72 python3.11 \
73 libgomp1
74 && apt-get remove -y software-properties-common \
75 && apt-get autoremove -y \
76 && rm -rf /var/lib/apt/lists/*
77
78# Create non-root user for security
79ARG USERNAME=appuser
80ARG USER_UID=1000
81ARG USER_GID=$USER_UID
82RUN groupadd --gid $USER_GID $USERNAME \
83 && useradd --uid $USER_UID --gid $USER_GID -m $USERNAME
84
85# Copy virtual environment from builder
86COPY --from=builder --chown=$USERNAME:$USERNAME /opt/venv /opt/venv
87ENV VIRTUAL_ENV=/opt/venv
88ENV PATH="$VIRTUAL_ENV/bin:$PATH"
89
90# Set working directory
91WORKDIR /app
92
93# Copy application code
94COPY --chown=$USERNAME:$USERNAME . .
95
96# Switch to non-root user
97USER $USERNAME
98
99# Expose port
100EXPOSE 8000
101
102# Default command
103CMD ["python", "main.py"]
🚀 推荐部署

高性能 GPU 与 AI 云服务器

为您的 Docker 容器提供强大的 NVIDIA 算力支持,支持 A100/H100,全球 32 个机房可选。

  • 支持 NVIDIA A100/H100 GPU 实例
  • 按小时计费,测试成本低至 $0.004/h
  • 全球 32+ 数据中心,极低访问延迟
  • 一键运行容器化应用与裸金属服务器
🎁 立即部署