windows下,1660S的显卡,同样的参考音频,同样的文字,1.5的commit:10d557a15e0bc234389a2900b7147c4c8a94fe3b,1.0commit:141599f04d576f0194ecbac90c4426b2ea32ac18
参考文字为: 洗衣粉和别的东西不一样,咱们谁家都得用的对不对?十斤装,这么一大袋子的天然皂粉,平时去线下商场超市,得三四十块钱的,今天厂家补贴,十六块九一大袋整整十斤装,不但给大家包邮送到家,还送七天无理由和运费险,咱们一定要抓住机会,抢个实惠。
1.5是cuda128,1.0是cuda126,其他依赖为按照readme来安装的 1.5输出:
start inference... Reference audio length: 12.70 seconds gpt_gen_time: 53.47 seconds gpt_forward_time: 2.41 seconds bigvgan_time: 2.63 seconds Total inference time: 60.10 seconds Generated audio length: 26.03 seconds RTF: 2.3090
1.0输出:
start inference... wav shape: torch.Size([1, 79872]) min: tensor(-28240., device='cuda:0', dtype=torch.float16) max: tensor(20944., device='cuda:0', dtype=torch.float16) wav shape: torch.Size([1, 253952]) min: tensor(-28880., device='cuda:0', dtype=torch.float16) max: tensor(22656., device='cuda:0', dtype=torch.float16) wav shape: torch.Size([1, 144384]) min: tensor(-29280., device='cuda:0', dtype=torch.float16) max: tensor(20416., device='cuda:0', dtype=torch.float16) Reference audio length: 12.70 seconds gpt_gen_time: 19.89 seconds gpt_forward_time: 1.38 seconds bigvgan_time: 2.02 seconds Total inference time: 24.14 seconds Generated audio length: 19.93 seconds RTF: 1.2117 wav file saved to: outputs\spk_1749135721.wav
现在问题有如下几个: 1、处理时间几乎变长了一倍; 2、处理停顿分局,感觉不如1.0的效果自然了; 3、出现空白音频的概率感觉比1.0要大一些; 不知道是不是哪里设置的有问题,或者环境有问题?
感谢回复。感谢辛苦验证。