[index-tts]Failed to load custom CUDA kernel for BigVGAN

2025-10-28 800 views
4

今天更新了1.5的模型,这个提示又来了 首次会自动编译,成功加载。 然后第二次启动webui就加载失败,请问这个是什么问题?如何彻底解决,我看Issues很多人都遇到了。

回答

0

一开始有如下错误,

GPT weights restored from: checkpoints/gpt.pth DeepSpeed加载失败,回退到标准推理: No module named 'deepspeed' Failed to load custom CUDA kernel for BigVGAN. Falling back to torch.

安装deepspeed之后就好了:pip install deepspeed

2

一开始有如下错误,

GPT weights restored from: checkpoints/gpt.pth DeepSpeed加载失败,回退到标准推理: No module named 'deepspeed' Failed to load custom CUDA kernel for BigVGAN. Falling back to torch.

安装deepspeed之后就好了:pip install deepspeed

我是win,cuda128,编译没成功

6

linux下需要安装 ninja-build

3

我用 uv 执行了 uv pip install deepspeed 也没用,会卡在

uv run webui.py
>> GPT weights restored from: checkpoints/gpt.pth
[2025-05-22 11:08:28,577] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-05-22 11:08:33,501] [INFO] [logging.py:107:log_dist] [Rank -1] DeepSpeed info: version=0.16.8, git-hash=unknown, git-branch=unknown
[2025-05-22 11:08:33,501] [WARNING] [config_utils.py:70:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2025-05-22 11:08:33,501] [INFO] [logging.py:107:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1

需要手动 ctrl +c 取消,才能回退 torch

> git pull
已经是最新的。
> uv run webui.py
>> GPT weights restored from: checkpoints/gpt.pth
[2025-05-22 11:08:28,577] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-05-22 11:08:33,501] [INFO] [logging.py:107:log_dist] [Rank -1] DeepSpeed info: version=0.16.8, git-hash=unknown, git-branch=unknown
[2025-05-22 11:08:33,501] [WARNING] [config_utils.py:70:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2025-05-22 11:08:33,501] [INFO] [logging.py:107:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
^C>> Failed to load custom CUDA kernel for BigVGAN. Falling back to torch.
Removing weight norm...
>> bigvgan weights restored from: checkpoints/bigvgan_generator.pth
2025-05-22 11:09:38,518 WETEXT INFO found existing fst: /mnt/data/workspace/ai/tts/index-tts/indextts/utils/tagger_cache/zh_tn_tagger.fst
2025-05-22 11:09:38,518 WETEXT INFO                     /mnt/data/workspace/ai/tts/index-tts/indextts/utils/tagger_cache/zh_tn_verbalizer.fst
2025-05-22 11:09:38,518 WETEXT INFO skip building fst for zh_normalizer ...
2025-05-22 11:09:38,756 WETEXT INFO found existing fst: /mnt/data/workspace/ai/tts/index-tts/.venv/lib/python3.10/site-packages/tn/en_tn_tagger.fst
2025-05-22 11:09:38,757 WETEXT INFO                     /mnt/data/workspace/ai/tts/index-tts/.venv/lib/python3.10/site-packages/tn/en_tn_verbalizer.fst
2025-05-22 11:09:38,757 WETEXT INFO skip building fst for en_normalizer ...
>> TextNormalizer loaded
>> bpe model loaded from: checkpoints/bpe.model
* Running on local URL:  http://127.0.0.1:7860
* To create a public link, set `share=True` in `launch()`.
5

[!IMPORTANT] 这个用于加速 BigVGAN 推理,如果加载失败不影响,可以忽略。

仍然想尝试的可以按下面的步骤安装环境:

  1. 确认 PyTorch 与你的 CUDA 驱动版本兼容:https://pytorch.org/get-started/locally/
  2. 确认 cuda 工具链正确安装(环境里执行 nvcc -V),如果没有则查看: https://developer.nvidia.com/cuda-toolkit-archive

nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Thu_Mar_28_02:30:10_Pacific_Daylight_Time_2024 Cuda compilation tools, release 12.4, V12.4.131 Build cuda_12.4.r12.4/compiler.34097967_0

  1. 需要配置与cuda 工具版本兼容的编译器,并确保在index-tts 运行环境中能加载到
  • Linux 用户则需要确认当前环境中的 gcc 是否与nvcc兼容,可以使用conda 安装 gcc 兼容版本

测试环境是否能正常工作:

> git clone git@github.com:NVIDIA/cuda-samples.git --depth=1
> cd cuda-samples
> nvcc -I.\Common Samples\1_Utilities\deviceQuery\deviceQuery.cpp -O3 -o deviceQuery.exe
> deviceQuery.exe
deviceQuery.exe Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce GTX 970"
...
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.6, CUDA Runtime Version = 12.4, NumDevs = 1
Result = PASS

> nvcc -I.\Common Samples\1_Utilities\bandwidthTest\bandwidthTest.cu -O3 -o bandwidthTest.exe
bandwidthTest.exe

[CUDA Bandwidth Test] - Starting...
Running on...

Device 0: NVIDIA GeForce GTX 970
Quick Mode

Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes)        Bandwidth(GB/s)
32000000                     12.7

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes)        Bandwidth(GB/s)
32000000                     12.7

Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes)        Bandwidth(GB/s)
32000000                     142.2

Result = PASS
8

Failed to load custom CuDA kernel for BigVGAN. Falling back to torch. 好像GPU就无法被调用,即使nvidia-smi显示正常,只能用CPU

1

linux下需要sudo apt update sudo apt install ninja-build 或者在虚拟环境中 pip install ninja --user