[index-tts]RuntimeError: CUDA error: device-side assert triggered

2025-10-29 129 views
2

运行indextts/infer.py遇到问题,合成一些其他的文本就没问题。是文本太长了吗?还是出现了非法token?

True
>> GPT weights restored from: checkpoints/index-tts/gpt.pth
[2025-05-14 08:42:37,259] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-05-14 08:42:39,065] [INFO] [logging.py:107:log_dist] [Rank -1] DeepSpeed info: version=0.16.7, git-hash=unknown, git-branch=unknown
[2025-05-14 08:42:39,066] [WARNING] [config_utils.py:70:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2025-05-14 08:42:39,068] [INFO] [logging.py:107:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
Removing weight norm...
>> bigvgan weights restored from: checkpoints/index-tts/bigvgan_generator.pth
2025-05-14 08:42:43,636 WETEXT INFO found existing fst: /work/conda3/envs/index-tts/lib/python3.10/site-packages/tn/zh_tn_tagger.fst
2025-05-14 08:42:43,636 WETEXT INFO                     /work/conda3/envs/index-tts/lib/python3.10/site-packages/tn/zh_tn_verbalizer.fst
2025-05-14 08:42:43,637 WETEXT INFO skip building fst for zh_normalizer ...
2025-05-14 08:42:44,104 WETEXT INFO found existing fst: /work/conda3/envs/index-tts/lib/python3.10/site-packages/tn/en_tn_tagger.fst
2025-05-14 08:42:44,104 WETEXT INFO                     /work/conda3/envs/index-tts/lib/python3.10/site-packages/tn/en_tn_verbalizer.fst
2025-05-14 08:42:44,104 WETEXT INFO skip building fst for en_normalizer ...
>> TextNormalizer loaded
2025-05-14 08:42:45,144 WETEXT INFO found existing fst: /work/conda3/envs/index-tts/lib/python3.10/site-packages/tn/zh_tn_tagger.fst
2025-05-14 08:42:45,144 WETEXT INFO found existing fst: /work/conda3/envs/index-tts/lib/python3.10/site-packages/tn/zh_tn_tagger.fst
2025-05-14 08:42:45,144 WETEXT INFO                     /work/conda3/envs/index-tts/lib/python3.10/site-packages/tn/zh_tn_verbalizer.fst
2025-05-14 08:42:45,144 WETEXT INFO                     /work/conda3/envs/index-tts/lib/python3.10/site-packages/tn/zh_tn_verbalizer.fst
2025-05-14 08:42:45,144 WETEXT INFO skip building fst for zh_normalizer ...
2025-05-14 08:42:45,144 WETEXT INFO skip building fst for zh_normalizer ...
2025-05-14 08:42:45,820 WETEXT INFO found existing fst: /work/conda3/envs/index-tts/lib/python3.10/site-packages/tn/en_tn_tagger.fst
2025-05-14 08:42:45,820 WETEXT INFO found existing fst: /work/conda3/envs/index-tts/lib/python3.10/site-packages/tn/en_tn_tagger.fst
2025-05-14 08:42:45,820 WETEXT INFO                     /work/conda3/envs/index-tts/lib/python3.10/site-packages/tn/en_tn_verbalizer.fst
2025-05-14 08:42:45,820 WETEXT INFO                     /work/conda3/envs/index-tts/lib/python3.10/site-packages/tn/en_tn_verbalizer.fst
2025-05-14 08:42:45,820 WETEXT INFO skip building fst for en_normalizer ...
2025-05-14 08:42:45,820 WETEXT INFO skip building fst for en_normalizer ...
>> bpe model loaded from: checkpoints/index-tts/bpe.model
>> start inference...
origin text:分类一组电影,并根据给定的题材、演员和导演信息将其分为三个不同的类别。 电影1:“黑暗骑士”(演员:克里斯蒂安·贝尔、希斯·莱杰;导演:克里斯托弗·诺兰);电影2:“盗梦空间”(演员:莱昂纳多·迪卡普里奥;导演:克里斯托弗·诺兰);电影3:“钢琴家”(演员:艾德里安·布洛迪;导演:罗曼·波兰斯基);电影4:“泰坦尼克号”(演员:莱昂纳多·迪卡普里奥;导演:詹姆斯·卡梅隆);电影5:“阿凡达”(演员:萨姆·沃辛顿;导演:詹姆斯·卡梅隆);电影6:“南方公园:大电影”(演员:马特·斯通、托马斯·艾恩格瑞;导演:特雷·帕克)
cond_mel shape: torch.Size([1, 100, 385]) dtype: torch.float32
text token count: 475
sentences count: 2
['▁', '分', '▁', '类', '▁', '一', '▁', '组', '▁', '电', '▁', '影', '▁,', '▁', '并', '▁', '根', '▁', '据', '▁', '给', '▁', '定', '▁', '的', '▁', '题', '▁', '材', '▁,', '▁', '演', '▁', '员', '▁', '和', '▁', '导', '▁', '演', '▁', '信', '▁', '息', '▁', '将', '▁', '其', '▁', '分', '▁', '为', '▁', '三', '▁', '个', '▁', '不', '▁', '同', '▁', '的', '▁', '类', '▁', '别', '▁.']
['▁', '电', '▁', '影', '▁', '一', '▁,', "'", '▁', '黑', '▁', '暗', '▁', '骑', '▁', '士', '▁', "'", "'", '▁', '演', '▁', '员', '▁,', '▁', '克', '▁', '里', '▁', '斯', '▁', '蒂', '▁', '安', '▁', '-', '▁', '贝', '▁', '尔', '▁,', '▁', '希', '▁', '斯', '▁', '-', '▁', '莱', '▁', '杰', '▁,', '▁', '导', '▁', '演', '▁,', '▁', '克', '▁', '里', '▁', '斯', '▁', '托', '▁', '弗', '▁', '-', '▁', '诺', '▁', '兰', '▁', "'", ',', '▁', '电', '▁', '影', '▁', '二', '▁,', "'", '▁', '盗', '▁', '梦', '▁', '空', '▁', '间', '▁', "'", "'", '▁', '演', '▁', '员', '▁,', '▁', '莱', '▁', '昂', '▁', '纳', '▁', '多', '▁', '-', '▁', '迪', '▁', '卡', '▁', '普', '▁', '里', '▁', '奥', '▁,', '▁', '导', '▁', '演', '▁,', '▁', '克', '▁', '里', '▁', '斯', '▁', '托', '▁', '弗', '▁', '-', '▁', '诺', '▁', '兰', '▁', "'", ',', '▁', '电', '▁', '影', '▁', '三', '▁,', "'", '▁', '钢', '▁', '琴', '▁', '家', '▁', "'", "'", '▁', '演', '▁', '员', '▁,', '▁', '艾', '▁', '德', '▁', '里', '▁', '安', '▁', '-', '▁', '布', '▁', '洛', '▁', '迪', '▁,', '▁', '导', '▁', '演', '▁,', '▁', '罗', '▁', '曼', '▁', '-', '▁', '波', '▁', '兰', '▁', '斯', '▁', '基', '▁', "'", ',', '▁', '电', '▁', '影', '▁', '四', '▁,', "'", '▁', '泰', '▁', '坦', '▁', '尼', '▁', '克', '▁', '号', '▁', "'", "'", '▁', '演', '▁', '员', '▁,', '▁', '莱', '▁', '昂', '▁', '纳', '▁', '多', '▁', '-', '▁', '迪', '▁', '卡', '▁', '普', '▁', '里', '▁', '奥', '▁,', '▁', '导', '▁', '演', '▁,', '▁', '詹', '▁', '姆', '▁', '斯', '▁', '-', '▁', '卡', '▁', '梅', '▁', '隆', '▁', "'", ',', '▁', '电', '▁', '影', '▁', '五', '▁,', "'", '▁', '阿', '▁', '凡', '▁', '达', '▁', "'", "'", '▁', '演', '▁', '员', '▁,', '▁', '萨', '▁', '姆', '▁', '-', '▁', '沃', '▁', '辛', '▁', '顿', '▁,', '▁', '导', '▁', '演', '▁,', '▁', '詹', '▁', '姆', '▁', '斯', '▁', '-', '▁', '卡', '▁', '梅', '▁', '隆', '▁', "'", ',', '▁', '电', '▁', '影', '▁', '六', '▁,', "'", '▁', '南', '▁', '方', '▁', '公', '▁', '园', '▁,', '▁', '大', '▁', '电', '▁', '影', '▁', "'", "'", '▁', '演', '▁', '员', '▁,', '▁', '马', '▁', '特', '▁', '-', '▁', '斯', '▁', '通', '▁,', '▁', '托', '▁', '马', '▁', '斯', '▁', '-', '▁', '艾', '▁', '恩', '▁', '格', '▁', '瑞', '▁,', '▁', '导', '▁', '演', '▁,', '▁', '特', '▁', '雷', '▁', '-', '▁', '帕', '▁', '克', '▁', "'"]
tensor([[10201,   457, 10201,  4353, 10201,     7, 10201,  4449, 10201,  3744,
         10201,  1798, 10202, 10201,  1699, 10201,  2684, 10201,  2225, 10201,
          4469, 10201,  1439, 10201,  3880, 10201,  6605, 10201,  2579, 10202,
         10201,  3288, 10201,   737, 10201,   763, 10201,  1488, 10201,  3288,
         10201,   265, 10201,  1912, 10201,  1494, 10201,   387, 10201,   457,
         10201,    42, 10201,    12, 10201,    34, 10201,    15, 10201,   683,
         10201,  3880, 10201,  4353, 10201,   476, 10203]], device='cuda:0',
       dtype=torch.int32)
text_tokens shape: torch.Size([1, 67]), text_tokens type: torch.int32
text_token_syms is same as sentence tokens True
tensor([[3166, 5693, 3515, 8018, 4343, 6221, 2186, 5900, 6090, 7883, 6158, 6739,
          509, 5705, 6775, 3217, 1783, 4645, 6203, 4160, 1460, 6317,  757, 7697,
         7422, 6742, 1539, 1313, 2850, 3409, 6454, 1471, 6699, 1409, 6736, 3463,
         5926, 6516, 7501, 3744, 5135, 5219, 6028, 6897, 7292, 1830, 4218, 3178,
         1716, 1117, 4376,  522,  783,  603, 3385, 4843, 5145, 7339, 2941, 1841,
         5080, 2945, 8000, 5571, 4485, 3831, 6731, 3068, 7616, 6871, 4497, 6987,
         2899, 5752,  494, 2450, 7341, 6323, 4745, 4014, 2126, 7468, 4103, 2136,
         2508, 5851, 6321, 7446,  269, 7682, 1823, 5073, 7910, 5928, 5608, 2499,
         2884, 1888,  772, 4548, 5239, 5498, 2034, 2407, 2255, 2410, 6820, 5001,
         7966, 6290,  384,  334, 2321, 5873, 6261, 3296, 7349, 6846, 6722, 3433,
         7237, 3781,  841, 2148, 5335, 7211, 4831, 8096, 6896, 1297, 1945, 7543,
         7119, 7829, 5711, 2226, 5639, 4184,  393, 6150, 6517, 6628, 7974, 1315,
         3948, 1592, 2142, 7075, 4122, 3183, 4043, 5430, 4253, 6025, 2326,  475,
          654,  379, 2189, 5733, 7558, 7948, 1612, 3500, 4505, 2934,  685, 2952,
         8093, 3600, 7255, 7685, 1188, 2491,  421, 2110, 8193]],
       device='cuda:0') <class 'torch.Tensor'>
codes shape: torch.Size([1, 177]), codes type: torch.int64
code len: tensor([177], device='cuda:0')
tensor([[3166, 5693, 3515, 8018, 4343, 6221, 2186, 5900, 6090, 7883, 6158, 6739,
          509, 5705, 6775, 3217, 1783, 4645, 6203, 4160, 1460, 6317,  757, 7697,
         7422, 6742, 1539, 1313, 2850, 3409, 6454, 1471, 6699, 1409, 6736, 3463,
         5926, 6516, 7501, 3744, 5135, 5219, 6028, 6897, 7292, 1830, 4218, 3178,
         1716, 1117, 4376,  522,  783,  603, 3385, 4843, 5145, 7339, 2941, 1841,
         5080, 2945, 8000, 5571, 4485, 3831, 6731, 3068, 7616, 6871, 4497, 6987,
         2899, 5752,  494, 2450, 7341, 6323, 4745, 4014, 2126, 7468, 4103, 2136,
         2508, 5851, 6321, 7446,  269, 7682, 1823, 5073, 7910, 5928, 5608, 2499,
         2884, 1888,  772, 4548, 5239, 5498, 2034, 2407, 2255, 2410, 6820, 5001,
         7966, 6290,  384,  334, 2321, 5873, 6261, 3296, 7349, 6846, 6722, 3433,
         7237, 3781,  841, 2148, 5335, 7211, 4831, 8096, 6896, 1297, 1945, 7543,
         7119, 7829, 5711, 2226, 5639, 4184,  393, 6150, 6517, 6628, 7974, 1315,
         3948, 1592, 2142, 7075, 4122, 3183, 4043, 5430, 4253, 6025, 2326,  475,
          654,  379, 2189, 5733, 7558, 7948, 1612, 3500, 4505, 2934,  685, 2952,
         8093, 3600, 7255, 7685, 1188, 2491,  421]], device='cuda:0') <class 'torch.Tensor'>
fix codes shape: torch.Size([1, 175]), codes type: torch.int64
code len: tensor([175], device='cuda:0')
wav shape: torch.Size([1, 179200]) min: tensor(-17232., device='cuda:0', dtype=torch.float16) max: tensor(21408., device='cuda:0', dtype=torch.float16)
tensor([[10201,  3744, 10201,  1798, 10201,     7, 10202, 10207, 10201,  6953,
         10201,  2516, 10201,  6720, 10201,  1198, 10201, 10207, 10207, 10201,
          3288, 10201,   737, 10202, 10201,   367, 10201,  6142, 10201,  2417,
         10201,  5112, 10201,  1430, 10201, 10974, 10201,  5672, 10201,  1500,
         10202, 10201,  1662, 10201,  2417, 10201, 10974, 10201,  5034, 10201,
          2596, 10202, 10201,  1488, 10201,  3288, 10202, 10201,   367, 10201,
          6142, 10201,  2417, 10201,  2078, 10201,  1762, 10201, 10974, 10201,
          5586, 10201,   382, 10201, 10207, 10205, 10201,  3744, 10201,  1798,
         10201,    83, 10202, 10207, 10201,  3911, 10201,  2726, 10201,  4207,
         10201,  6383, 10201, 10207, 10207, 10201,  3288, 10201,   737, 10202,
         10201,  5034, 10201,  2454, 10201,  4433, 10201,  1216, 10201, 10974,
         10201,  5947, 10201,   591, 10201,  2500, 10201,  6142, 10201,  1253,
         10202, 10201,  1488, 10201,  3288, 10202, 10201,   367, 10201,  6142,
         10201,  2417, 10201,  2078, 10201,  1762, 10201, 10974, 10201,  5586,
         10201,   382, 10201, 10207, 10205, 10201,  3744, 10201,  1798, 10201,
            12, 10202, 10207, 10201,  6188, 10201,  3671, 10201,  1459, 10201,
         10207, 10207, 10201,  3288, 10201,   737, 10202, 10201,  4881, 10201,
          1829, 10201,  6142, 10201,  1430, 10201, 10974, 10201,  1658, 10201,
          3081, 10201,  5947, 10202, 10201,  1488, 10201,  3288, 10202, 10201,
          4564, 10201,  2540, 10201, 10974, 10201,  3052, 10201,   382, 10201,
          2417, 10201,  1143, 10201, 10207, 10205, 10201,  3744, 10201,  1798,
         10201,  1017, 10202, 10207, 10201,  3061, 10201,  1078, 10201,  1520,
         10201,   367, 10201,   670, 10201, 10207, 10207, 10201,  3288, 10201,
           737, 10202, 10201,  5034, 10201,  2454, 10201,  4433, 10201,  1216,
         10201, 10974, 10201,  5947, 10201,   591, 10201,  2500, 10201,  6142,
         10201,  1253, 10202, 10201,  1488, 10201,  3288, 10202, 10201,  5493,
         10201,  1289, 10201,  2417, 10201, 10974, 10201,   591, 10201,  2720,
         10201,  6461, 10201, 10207, 10205, 10201,  3744, 10201,  1798, 10201,
            90, 10202, 10207, 10201,  6431, 10201,   437, 10201,  5920, 10201,
         10207, 10207, 10201,  3288, 10201,   737, 10202, 10201,  5088, 10201,
          1289, 10201, 10974, 10201,  3001, 10201,  5908, 10201,  6585, 10202,
         10201,  1488, 10201,  3288, 10202, 10201,  5493, 10201,  1289, 10201,
          2417, 10201, 10974, 10201,   591, 10201,  2720, 10201,  6461, 10201,
         10207, 10205, 10201,  3744, 10201,  1798, 10201,   380, 10202, 10207,
         10201,   585, 10201,  2419, 10201,   379, 10201,  1027, 10202, 10201,
          1220, 10201,  3744, 10201,  1798, 10201, 10207, 10207, 10201,  3288,
         10201,   737, 10202, 10201,  6687, 10201,  3514, 10201, 10974, 10201,
          2417, 10201,  5975, 10202, 10201,  2078, 10201,  6687, 10201,  2417,
         10201, 10974, 10201,  4881, 10201,  1907, 10201,  2686, 10201,  3683,
         10202, 10201,  1488, 10201,  3288, 10202, 10201,  3514, 10201,  6501,
         10201, 10974, 10201,  1667, 10201,   367, 10201, 10207]],
       device='cuda:0', dtype=torch.int32)
text_tokens shape: torch.Size([1, 408]), text_tokens type: torch.int32
text_token_syms is same as sentence tokens True
Traceback (most recent call last):
  File "/tsdata3/wwj/index-tts/indextts/infer.py", line 567, in <module>
    tts.infer(audio_prompt=prompt_wav, text=text, output_path='output/gen.wav', verbose=True)
  File "/tsdata3/wwj/index-tts/indextts/infer.py", line 474, in infer
    codes = self.gpt.inference_speech(auto_conditioning, text_tokens,
  File "/tsdata3/wwj/index-tts/indextts/gpt/model.py", line 599, in inference_speech
    speech_conditioning_latent = self.get_conditioning(speech_conditioning_latent, cond_mel_lengths)
  File "/tsdata3/wwj/index-tts/indextts/gpt/model.py", line 497, in get_conditioning
    speech_conditioning_input, mask = self.conditioning_encoder(speech_conditioning_input.transpose(1, 2),
  File "/work/conda3/envs/index-tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/work/conda3/envs/index-tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/tsdata3/wwj/index-tts/indextts/gpt/conformer_encoder.py", line 430, in forward
    xs, chunk_masks, _, _ = layer(xs, chunk_masks, pos_emb, mask_pad)
  File "/work/conda3/envs/index-tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/work/conda3/envs/index-tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/tsdata3/wwj/index-tts/indextts/gpt/conformer_encoder.py", line 306, in forward
    x = residual + self.ff_scale * self.dropout(self.feed_forward(x))
  File "/work/conda3/envs/index-tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/work/conda3/envs/index-tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/tsdata3/wwj/index-tts/indextts/gpt/conformer_encoder.py", line 53, in forward
    return self.w_2(self.dropout(self.activation(self.w_1(xs))))
  File "/work/conda3/envs/index-tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/work/conda3/envs/index-tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/work/conda3/envs/index-tts/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 117, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

回答

5

TextTokenizer有bug,第二个分句太长了,我修一下

0
161 修复后:

Image 但效果不太行。 建议用代替 Image