节点 /etc/containd/config.toml文件已配置
[plugins.“io.containerd.grpc.v1.cri”]
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "nvidia"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
BinaryName = "/usr/bin/nvidia-container-runtime"
containerd已重启过
为什么节点的守护进程出现错误
nvidia-device-plugin-daemonset容器出现以下日志,
2022-11-18T15:06:15.356745259+08:00 2022/11/18 07:06:15 Starting FS watcher.
2022-11-18T15:06:15.356833856+08:00 2022/11/18 07:06:15 Starting OS watcher.
2022-11-18T15:06:15.356969052+08:00 2022/11/18 07:06:15 Starting Plugins.
2022-11-18T15:06:15.356988551+08:00 2022/11/18 07:06:15 Loading configuration.
2022-11-18T15:06:15.356995051+08:00 2022/11/18 07:06:15 Initializing NVML.
2022-11-18T15:06:15.357161445+08:00 2022/11/18 07:06:15 Failed to initialize NVML: could not load NVML library.
2022-11-18T15:06:15.357168645+08:00 2022/11/18 07:06:15 If this is a GPU node, did you set the docker default runtime to `nvidia`?
2022-11-18T15:06:15.357173045+08:00 2022/11/18 07:06:15 You can check the prerequisites at: https://github.com/NVIDIA/k8s-device-plugin#prerequisites
2022-11-18T15:06:15.357177445+08:00 2022/11/18 07:06:15 You can learn how to set the runtime at: https://github.com/NVIDIA/k8s-device-plugin#quick-start
2022-11-18T15:06:15.357180245+08:00 2022/11/18 07:06:15 If this is not a GPU node, you should set up a toleration or nodeSelector to only deploy this plugin on GPU nodes