【关闭 use_cuda_kernel = False】
normalized text:每一次的努力都是为了更好的未来,不要害怕失败,要善于从失败中汲取经验.让我们一起勇敢前行,迈向更加美好的明天.
wav shape: torch.Size([1, 175104]) min: tensor(-16128., device='cuda:0', dtype=torch.float16) max: tensor(20016., device='cuda:0', dtype=torch.float16)
wav shape: torch.Size([1, 153600]) min: tensor(-13384., device='cuda:0', dtype=torch.float16) max: tensor(19792., device='cuda:0', dtype=torch.float16)
>> Reference audio length: 7.90 seconds
>> gpt_gen_time: 11.41 seconds
>> gpt_forward_time: 0.10 seconds
>> bigvgan_time: 0.33 seconds
>> Total inference time: 11.91 seconds
>> Generated audio length: 13.70 seconds
>> RTF: 0.8699
>> start inference...
normalized text:每一次的努力都是为了更好的未来,不要害怕失败,要善于从失败中汲取经验.让我们一起勇敢前行,迈向更加美好的明天.
wav shape: torch.Size([1, 212992]) min: tensor(-17504., device='cuda:0', dtype=torch.float16) max: tensor(20944., device='cuda:0', dtype=torch.float16)
wav shape: torch.Size([1, 140288]) min: tensor(-15912., device='cuda:0', dtype=torch.float16) max: tensor(18768., device='cuda:0', dtype=torch.float16)
>> Reference audio length: 7.90 seconds
>> gpt_gen_time: 12.55 seconds
>> gpt_forward_time: 0.10 seconds
>> bigvgan_time: 0.35 seconds
>> Total inference time: 13.13 seconds
>> Generated audio length: 14.72 seconds
>> RTF: 0.8919
【启用 use_cuda_kernel = True】
normalized text:每一次的努力都是为了更好的未来,不要害怕失败,要善于从失败中汲取经验.让我们一起勇敢前行,迈向更加美好的明天.
wav shape: torch.Size([1, 183296]) min: tensor(-16864., device='cuda:0', dtype=torch.float16) max: tensor(18288., device='cuda:0', dtype=torch.float16)
wav shape: torch.Size([1, 153600]) min: tensor(-16312., device='cuda:0', dtype=torch.float16) max: tensor(21584., device='cuda:0', dtype=torch.float16)
>> Reference audio length: 7.90 seconds
>> gpt_gen_time: 6.16 seconds
>> gpt_forward_time: 0.05 seconds
>> bigvgan_time: 0.16 seconds
>> Total inference time: 6.38 seconds
>> Generated audio length: 14.04 seconds
>> RTF: 0.4547
>> start inference...
normalized text:每一次的努力都是为了更好的未来,不要害怕失败,要善于从失败中汲取经验.让我们一起勇敢前行,迈向更加美好的明天.
wav shape: torch.Size([1, 188416]) min: tensor(-17632., device='cuda:0', dtype=torch.float16) max: tensor(20416., device='cuda:0', dtype=torch.float16)
wav shape: torch.Size([1, 166912]) min: tensor(-17440., device='cuda:0', dtype=torch.float16) max: tensor(23456., device='cuda:0', dtype=torch.float16)
>> Reference audio length: 7.90 seconds
>> gpt_gen_time: 6.44 seconds
>> gpt_forward_time: 0.05 seconds
>> bigvgan_time: 0.16 seconds
>> Total inference time: 6.66 seconds
>> Generated audio length: 14.81 seconds
>> RTF: 0.4496
@yrom 性能提升有效~