[index-tts]使用BigVGAN fused cuda kernel

指标对比

测试设备 Windows10 NVIDIA GeForce GTX 970 (4096MiB) cuda 12.4, torch 2.5.1

指标 w/o Custom CUDA Kernel w/ Custom CUDA Kernel 第1次推理第2次推理第3次推理第1次推理第2次推理第3次推理总推理时间 (秒) 33.56 25.57 26.33 22.02 22.81 22.10 生成音频长度 (秒) 18.90 19.16 19.75 18.73 19.93 19.54 RTF (实时因子) 1.7756 1.3349 1.3328 1.1755 1.1446 1.1311

使用 Custom CUDA Kernel for BigVGAN 显著提升了推理效率，具体表现为：

推理时间缩短（平均节省 6.18 秒）。 RTF 值降低（更接近实时生成）。

参考音频：ZH/prompt/2631296891109983590.wav 输入文本：

顿时，气氛变得沉郁起来。乍看之下，一切的困扰仿佛都围绕在我身边。我皱着眉头，感受着那份压力，但我知道我不能放弃，不能认输。于是，我深吸一口气，心底的声音告诉我：“无论如何，都要冷静下来，重新开始。”

输出对比：启用前：spk_1744681011.webm 启用后：spk_1744682182.webm

注意

未在其他cuda环境下测试

yrom

cuda 12.4, torch 2.5.1

yrom

BigVGAN/alias_free_activation/cuda/anti_alias_activation.cpp', needed by 'anti_alias_activation.o', missing and no known rule to make it

Failed to load custom CUDA kernel for BigVGAN. Falling back to torch. Removing weight norm... bigvgan weights restored from: checkpoints\bigvgan_generator.pth

刚拉取更新了代码，提示无法启用 use_cuda_kernel，该如何解决正确使用？Windows11环境 cuda 12.4, torch 2.5.1, py310

juntaosun

indextts/BigVGAN/alias_free_activation/cuda/build 目录有正确生成吗？

cd indextts/BigVGAN/alias_free_activation/cuda/build
ninja

yrom

经过以下处理，可以正常完成编译：（1）安装 Ninja

pip install ninja

（2）indextts 所在的路径绝对不能含有（中文）字符，否则会导致 ninja 编译失败。

cd indextts/BigVGAN/alias_free_activation/cuda/build
ninja

最终启动时，控制台的输出如下：

Emitting ninja build file j:\index-tts\indextts\BigVGAN\alias_free_activation\cuda\build\build.ninja...
Building extension module anti_alias_activation_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module anti_alias_activation_cuda...
>> Preload custom CUDA kernel for BigVGAN <module 'anti_alias_activation_cuda' from 'j:\\index-tts\\indextts\\BigVGAN\\alias_free_activation\\cuda\\build\\anti_alias_activation_cuda.pyd'>
No modifications detected for re-loaded extension module anti_alias_activation_cuda, skipping build step...
Loading extension module anti_alias_activation_cuda...
Removing weight norm...
>> bigvgan weights restored from: checkpoints\bigvgan_generator.pth
>> bpe model loaded from: checkpoints\bpe.model
>> TextNormalizer loaded
* Running on local URL:  http://127.0.0.1:7860

@yrom 上述是输出信息，不知道 use_cuda_kernel 是启用成功，还是失败了呢？

juntaosun

有打印>> Preload custom CUDA kernel for BigVGAN就代表成功了，可以试试有没有性能提升 @juntaosun

yrom

【关闭 use_cuda_kernel = False】

normalized text:每一次的努力都是为了更好的未来,不要害怕失败,要善于从失败中汲取经验.让我们一起勇敢前行,迈向更加美好的明天.
wav shape: torch.Size([1, 175104]) min: tensor(-16128., device='cuda:0', dtype=torch.float16) max: tensor(20016., device='cuda:0', dtype=torch.float16)
wav shape: torch.Size([1, 153600]) min: tensor(-13384., device='cuda:0', dtype=torch.float16) max: tensor(19792., device='cuda:0', dtype=torch.float16)
>> Reference audio length: 7.90 seconds
>> gpt_gen_time: 11.41 seconds
>> gpt_forward_time: 0.10 seconds
>> bigvgan_time: 0.33 seconds
>> Total inference time: 11.91 seconds
>> Generated audio length: 13.70 seconds
>> RTF: 0.8699
>> start inference...
normalized text:每一次的努力都是为了更好的未来,不要害怕失败,要善于从失败中汲取经验.让我们一起勇敢前行,迈向更加美好的明天.
wav shape: torch.Size([1, 212992]) min: tensor(-17504., device='cuda:0', dtype=torch.float16) max: tensor(20944., device='cuda:0', dtype=torch.float16)
wav shape: torch.Size([1, 140288]) min: tensor(-15912., device='cuda:0', dtype=torch.float16) max: tensor(18768., device='cuda:0', dtype=torch.float16)
>> Reference audio length: 7.90 seconds
>> gpt_gen_time: 12.55 seconds
>> gpt_forward_time: 0.10 seconds
>> bigvgan_time: 0.35 seconds
>> Total inference time: 13.13 seconds
>> Generated audio length: 14.72 seconds
>> RTF: 0.8919

【启用 use_cuda_kernel = True】

normalized text:每一次的努力都是为了更好的未来,不要害怕失败,要善于从失败中汲取经验.让我们一起勇敢前行,迈向更加美好的明天.
wav shape: torch.Size([1, 183296]) min: tensor(-16864., device='cuda:0', dtype=torch.float16) max: tensor(18288., device='cuda:0', dtype=torch.float16)
wav shape: torch.Size([1, 153600]) min: tensor(-16312., device='cuda:0', dtype=torch.float16) max: tensor(21584., device='cuda:0', dtype=torch.float16)
>> Reference audio length: 7.90 seconds
>> gpt_gen_time: 6.16 seconds
>> gpt_forward_time: 0.05 seconds
>> bigvgan_time: 0.16 seconds
>> Total inference time: 6.38 seconds
>> Generated audio length: 14.04 seconds
>> RTF: 0.4547
>> start inference...
normalized text:每一次的努力都是为了更好的未来,不要害怕失败,要善于从失败中汲取经验.让我们一起勇敢前行,迈向更加美好的明天.
wav shape: torch.Size([1, 188416]) min: tensor(-17632., device='cuda:0', dtype=torch.float16) max: tensor(20416., device='cuda:0', dtype=torch.float16)
wav shape: torch.Size([1, 166912]) min: tensor(-17440., device='cuda:0', dtype=torch.float16) max: tensor(23456., device='cuda:0', dtype=torch.float16)
>> Reference audio length: 7.90 seconds
>> gpt_gen_time: 6.44 seconds
>> gpt_forward_time: 0.05 seconds
>> bigvgan_time: 0.16 seconds
>> Total inference time: 6.66 seconds
>> Generated audio length: 14.81 seconds
>> RTF: 0.4496

@yrom 性能提升有效~

juntaosun

ninja: error: loading 'build.ninja': 绯荤粺鎵句笉鍒版寚瀹氱殑鏂囦欢銆，执行ninja报错了，

einsqing

https://github.com/yrom/evaluate-index-tts 你可以试试这个项目，在你的机子上评估一下，注意indextts要用项目里配的fork的分支

python evaluate.py eval --model_dir /path/to/index-tts/checkpoints --cfg_path checkpoints/config.yaml \
&nbsp; &nbsp; --test_set testset.json --output_dir outputs \
&nbsp; &nbsp; --lang en --text-type long --device cpu

python evaluate.py eval --model_dir /path/to/index-tts/checkpoints --cfg_path checkpoints/config.yaml \
&nbsp; &nbsp; --test_set testset.json --output_dir outputs \
&nbsp; &nbsp; --lang en --text-type long --device cuda

python evaluate.py eval --model_dir /path/to/index-tts/checkpoints --cfg_path checkpoints/config.yaml \
&nbsp; &nbsp; --test_set testset.json --output_dir outputs \
&nbsp; &nbsp; --lang en --text-type long --device cuda --enable_cuda_kernel

yrom

cd indextts/BigVGAN/alias_free_activation/cuda/build

build文件夹里面什么也没有，如何生成？

einsqing

确认 cuda 工具正确安装（环境里执行 nvcc -V）需要配置与cuda版本兼容的编译器如visual studio 2022（确认执行环境里能找到cl.exe）或 gcc/clang

yrom

环境 torch 2.0.1+cu118

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0

visual studio 2022也安装了，执行ninja报错ninja: error: loading 'build.ninja': 绯荤粺鎵句笉鍒版寚瀹氱殑鏂囦欢銆执行cmake报错缺少alias_free_activation/cuda" does not appear to contain CMakeLists.txt

einsqing

@einsqing 仅仅安装 visual studio 2022 还不够，还要配置它的开发环境变量。并且还要安装正确的 CUDA 版本，并且与 torch 对应。任何一个不匹配都会失败，挺严格的~

juntaosun

跑通了，每台电脑用use_cuda_kernel都需要安装vs2022？

einsqing

是的，跟运行环境里的cuda、torch版本强相关

yrom

[index-tts]使用BigVGAN fused cuda kernel

回答