2
运行indextts/infer.py遇到问题,合成一些其他的文本就没问题。是文本太长了吗?还是出现了非法token?
True
>> GPT weights restored from: checkpoints/index-tts/gpt.pth
[2025-05-14 08:42:37,259] [INFO] [real_accelerator.py:239:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-05-14 08:42:39,065] [INFO] [logging.py:107:log_dist] [Rank -1] DeepSpeed info: version=0.16.7, git-hash=unknown, git-branch=unknown
[2025-05-14 08:42:39,066] [WARNING] [config_utils.py:70:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2025-05-14 08:42:39,068] [INFO] [logging.py:107:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
Removing weight norm...
>> bigvgan weights restored from: checkpoints/index-tts/bigvgan_generator.pth
2025-05-14 08:42:43,636 WETEXT INFO found existing fst: /work/conda3/envs/index-tts/lib/python3.10/site-packages/tn/zh_tn_tagger.fst
2025-05-14 08:42:43,636 WETEXT INFO /work/conda3/envs/index-tts/lib/python3.10/site-packages/tn/zh_tn_verbalizer.fst
2025-05-14 08:42:43,637 WETEXT INFO skip building fst for zh_normalizer ...
2025-05-14 08:42:44,104 WETEXT INFO found existing fst: /work/conda3/envs/index-tts/lib/python3.10/site-packages/tn/en_tn_tagger.fst
2025-05-14 08:42:44,104 WETEXT INFO /work/conda3/envs/index-tts/lib/python3.10/site-packages/tn/en_tn_verbalizer.fst
2025-05-14 08:42:44,104 WETEXT INFO skip building fst for en_normalizer ...
>> TextNormalizer loaded
2025-05-14 08:42:45,144 WETEXT INFO found existing fst: /work/conda3/envs/index-tts/lib/python3.10/site-packages/tn/zh_tn_tagger.fst
2025-05-14 08:42:45,144 WETEXT INFO found existing fst: /work/conda3/envs/index-tts/lib/python3.10/site-packages/tn/zh_tn_tagger.fst
2025-05-14 08:42:45,144 WETEXT INFO /work/conda3/envs/index-tts/lib/python3.10/site-packages/tn/zh_tn_verbalizer.fst
2025-05-14 08:42:45,144 WETEXT INFO /work/conda3/envs/index-tts/lib/python3.10/site-packages/tn/zh_tn_verbalizer.fst
2025-05-14 08:42:45,144 WETEXT INFO skip building fst for zh_normalizer ...
2025-05-14 08:42:45,144 WETEXT INFO skip building fst for zh_normalizer ...
2025-05-14 08:42:45,820 WETEXT INFO found existing fst: /work/conda3/envs/index-tts/lib/python3.10/site-packages/tn/en_tn_tagger.fst
2025-05-14 08:42:45,820 WETEXT INFO found existing fst: /work/conda3/envs/index-tts/lib/python3.10/site-packages/tn/en_tn_tagger.fst
2025-05-14 08:42:45,820 WETEXT INFO /work/conda3/envs/index-tts/lib/python3.10/site-packages/tn/en_tn_verbalizer.fst
2025-05-14 08:42:45,820 WETEXT INFO /work/conda3/envs/index-tts/lib/python3.10/site-packages/tn/en_tn_verbalizer.fst
2025-05-14 08:42:45,820 WETEXT INFO skip building fst for en_normalizer ...
2025-05-14 08:42:45,820 WETEXT INFO skip building fst for en_normalizer ...
>> bpe model loaded from: checkpoints/index-tts/bpe.model
>> start inference...
origin text:分类一组电影,并根据给定的题材、演员和导演信息将其分为三个不同的类别。 电影1:“黑暗骑士”(演员:克里斯蒂安·贝尔、希斯·莱杰;导演:克里斯托弗·诺兰);电影2:“盗梦空间”(演员:莱昂纳多·迪卡普里奥;导演:克里斯托弗·诺兰);电影3:“钢琴家”(演员:艾德里安·布洛迪;导演:罗曼·波兰斯基);电影4:“泰坦尼克号”(演员:莱昂纳多·迪卡普里奥;导演:詹姆斯·卡梅隆);电影5:“阿凡达”(演员:萨姆·沃辛顿;导演:詹姆斯·卡梅隆);电影6:“南方公园:大电影”(演员:马特·斯通、托马斯·艾恩格瑞;导演:特雷·帕克)
cond_mel shape: torch.Size([1, 100, 385]) dtype: torch.float32
text token count: 475
sentences count: 2
['▁', '分', '▁', '类', '▁', '一', '▁', '组', '▁', '电', '▁', '影', '▁,', '▁', '并', '▁', '根', '▁', '据', '▁', '给', '▁', '定', '▁', '的', '▁', '题', '▁', '材', '▁,', '▁', '演', '▁', '员', '▁', '和', '▁', '导', '▁', '演', '▁', '信', '▁', '息', '▁', '将', '▁', '其', '▁', '分', '▁', '为', '▁', '三', '▁', '个', '▁', '不', '▁', '同', '▁', '的', '▁', '类', '▁', '别', '▁.']
['▁', '电', '▁', '影', '▁', '一', '▁,', "'", '▁', '黑', '▁', '暗', '▁', '骑', '▁', '士', '▁', "'", "'", '▁', '演', '▁', '员', '▁,', '▁', '克', '▁', '里', '▁', '斯', '▁', '蒂', '▁', '安', '▁', '-', '▁', '贝', '▁', '尔', '▁,', '▁', '希', '▁', '斯', '▁', '-', '▁', '莱', '▁', '杰', '▁,', '▁', '导', '▁', '演', '▁,', '▁', '克', '▁', '里', '▁', '斯', '▁', '托', '▁', '弗', '▁', '-', '▁', '诺', '▁', '兰', '▁', "'", ',', '▁', '电', '▁', '影', '▁', '二', '▁,', "'", '▁', '盗', '▁', '梦', '▁', '空', '▁', '间', '▁', "'", "'", '▁', '演', '▁', '员', '▁,', '▁', '莱', '▁', '昂', '▁', '纳', '▁', '多', '▁', '-', '▁', '迪', '▁', '卡', '▁', '普', '▁', '里', '▁', '奥', '▁,', '▁', '导', '▁', '演', '▁,', '▁', '克', '▁', '里', '▁', '斯', '▁', '托', '▁', '弗', '▁', '-', '▁', '诺', '▁', '兰', '▁', "'", ',', '▁', '电', '▁', '影', '▁', '三', '▁,', "'", '▁', '钢', '▁', '琴', '▁', '家', '▁', "'", "'", '▁', '演', '▁', '员', '▁,', '▁', '艾', '▁', '德', '▁', '里', '▁', '安', '▁', '-', '▁', '布', '▁', '洛', '▁', '迪', '▁,', '▁', '导', '▁', '演', '▁,', '▁', '罗', '▁', '曼', '▁', '-', '▁', '波', '▁', '兰', '▁', '斯', '▁', '基', '▁', "'", ',', '▁', '电', '▁', '影', '▁', '四', '▁,', "'", '▁', '泰', '▁', '坦', '▁', '尼', '▁', '克', '▁', '号', '▁', "'", "'", '▁', '演', '▁', '员', '▁,', '▁', '莱', '▁', '昂', '▁', '纳', '▁', '多', '▁', '-', '▁', '迪', '▁', '卡', '▁', '普', '▁', '里', '▁', '奥', '▁,', '▁', '导', '▁', '演', '▁,', '▁', '詹', '▁', '姆', '▁', '斯', '▁', '-', '▁', '卡', '▁', '梅', '▁', '隆', '▁', "'", ',', '▁', '电', '▁', '影', '▁', '五', '▁,', "'", '▁', '阿', '▁', '凡', '▁', '达', '▁', "'", "'", '▁', '演', '▁', '员', '▁,', '▁', '萨', '▁', '姆', '▁', '-', '▁', '沃', '▁', '辛', '▁', '顿', '▁,', '▁', '导', '▁', '演', '▁,', '▁', '詹', '▁', '姆', '▁', '斯', '▁', '-', '▁', '卡', '▁', '梅', '▁', '隆', '▁', "'", ',', '▁', '电', '▁', '影', '▁', '六', '▁,', "'", '▁', '南', '▁', '方', '▁', '公', '▁', '园', '▁,', '▁', '大', '▁', '电', '▁', '影', '▁', "'", "'", '▁', '演', '▁', '员', '▁,', '▁', '马', '▁', '特', '▁', '-', '▁', '斯', '▁', '通', '▁,', '▁', '托', '▁', '马', '▁', '斯', '▁', '-', '▁', '艾', '▁', '恩', '▁', '格', '▁', '瑞', '▁,', '▁', '导', '▁', '演', '▁,', '▁', '特', '▁', '雷', '▁', '-', '▁', '帕', '▁', '克', '▁', "'"]
tensor([[10201, 457, 10201, 4353, 10201, 7, 10201, 4449, 10201, 3744,
10201, 1798, 10202, 10201, 1699, 10201, 2684, 10201, 2225, 10201,
4469, 10201, 1439, 10201, 3880, 10201, 6605, 10201, 2579, 10202,
10201, 3288, 10201, 737, 10201, 763, 10201, 1488, 10201, 3288,
10201, 265, 10201, 1912, 10201, 1494, 10201, 387, 10201, 457,
10201, 42, 10201, 12, 10201, 34, 10201, 15, 10201, 683,
10201, 3880, 10201, 4353, 10201, 476, 10203]], device='cuda:0',
dtype=torch.int32)
text_tokens shape: torch.Size([1, 67]), text_tokens type: torch.int32
text_token_syms is same as sentence tokens True
tensor([[3166, 5693, 3515, 8018, 4343, 6221, 2186, 5900, 6090, 7883, 6158, 6739,
509, 5705, 6775, 3217, 1783, 4645, 6203, 4160, 1460, 6317, 757, 7697,
7422, 6742, 1539, 1313, 2850, 3409, 6454, 1471, 6699, 1409, 6736, 3463,
5926, 6516, 7501, 3744, 5135, 5219, 6028, 6897, 7292, 1830, 4218, 3178,
1716, 1117, 4376, 522, 783, 603, 3385, 4843, 5145, 7339, 2941, 1841,
5080, 2945, 8000, 5571, 4485, 3831, 6731, 3068, 7616, 6871, 4497, 6987,
2899, 5752, 494, 2450, 7341, 6323, 4745, 4014, 2126, 7468, 4103, 2136,
2508, 5851, 6321, 7446, 269, 7682, 1823, 5073, 7910, 5928, 5608, 2499,
2884, 1888, 772, 4548, 5239, 5498, 2034, 2407, 2255, 2410, 6820, 5001,
7966, 6290, 384, 334, 2321, 5873, 6261, 3296, 7349, 6846, 6722, 3433,
7237, 3781, 841, 2148, 5335, 7211, 4831, 8096, 6896, 1297, 1945, 7543,
7119, 7829, 5711, 2226, 5639, 4184, 393, 6150, 6517, 6628, 7974, 1315,
3948, 1592, 2142, 7075, 4122, 3183, 4043, 5430, 4253, 6025, 2326, 475,
654, 379, 2189, 5733, 7558, 7948, 1612, 3500, 4505, 2934, 685, 2952,
8093, 3600, 7255, 7685, 1188, 2491, 421, 2110, 8193]],
device='cuda:0') <class 'torch.Tensor'>
codes shape: torch.Size([1, 177]), codes type: torch.int64
code len: tensor([177], device='cuda:0')
tensor([[3166, 5693, 3515, 8018, 4343, 6221, 2186, 5900, 6090, 7883, 6158, 6739,
509, 5705, 6775, 3217, 1783, 4645, 6203, 4160, 1460, 6317, 757, 7697,
7422, 6742, 1539, 1313, 2850, 3409, 6454, 1471, 6699, 1409, 6736, 3463,
5926, 6516, 7501, 3744, 5135, 5219, 6028, 6897, 7292, 1830, 4218, 3178,
1716, 1117, 4376, 522, 783, 603, 3385, 4843, 5145, 7339, 2941, 1841,
5080, 2945, 8000, 5571, 4485, 3831, 6731, 3068, 7616, 6871, 4497, 6987,
2899, 5752, 494, 2450, 7341, 6323, 4745, 4014, 2126, 7468, 4103, 2136,
2508, 5851, 6321, 7446, 269, 7682, 1823, 5073, 7910, 5928, 5608, 2499,
2884, 1888, 772, 4548, 5239, 5498, 2034, 2407, 2255, 2410, 6820, 5001,
7966, 6290, 384, 334, 2321, 5873, 6261, 3296, 7349, 6846, 6722, 3433,
7237, 3781, 841, 2148, 5335, 7211, 4831, 8096, 6896, 1297, 1945, 7543,
7119, 7829, 5711, 2226, 5639, 4184, 393, 6150, 6517, 6628, 7974, 1315,
3948, 1592, 2142, 7075, 4122, 3183, 4043, 5430, 4253, 6025, 2326, 475,
654, 379, 2189, 5733, 7558, 7948, 1612, 3500, 4505, 2934, 685, 2952,
8093, 3600, 7255, 7685, 1188, 2491, 421]], device='cuda:0') <class 'torch.Tensor'>
fix codes shape: torch.Size([1, 175]), codes type: torch.int64
code len: tensor([175], device='cuda:0')
wav shape: torch.Size([1, 179200]) min: tensor(-17232., device='cuda:0', dtype=torch.float16) max: tensor(21408., device='cuda:0', dtype=torch.float16)
tensor([[10201, 3744, 10201, 1798, 10201, 7, 10202, 10207, 10201, 6953,
10201, 2516, 10201, 6720, 10201, 1198, 10201, 10207, 10207, 10201,
3288, 10201, 737, 10202, 10201, 367, 10201, 6142, 10201, 2417,
10201, 5112, 10201, 1430, 10201, 10974, 10201, 5672, 10201, 1500,
10202, 10201, 1662, 10201, 2417, 10201, 10974, 10201, 5034, 10201,
2596, 10202, 10201, 1488, 10201, 3288, 10202, 10201, 367, 10201,
6142, 10201, 2417, 10201, 2078, 10201, 1762, 10201, 10974, 10201,
5586, 10201, 382, 10201, 10207, 10205, 10201, 3744, 10201, 1798,
10201, 83, 10202, 10207, 10201, 3911, 10201, 2726, 10201, 4207,
10201, 6383, 10201, 10207, 10207, 10201, 3288, 10201, 737, 10202,
10201, 5034, 10201, 2454, 10201, 4433, 10201, 1216, 10201, 10974,
10201, 5947, 10201, 591, 10201, 2500, 10201, 6142, 10201, 1253,
10202, 10201, 1488, 10201, 3288, 10202, 10201, 367, 10201, 6142,
10201, 2417, 10201, 2078, 10201, 1762, 10201, 10974, 10201, 5586,
10201, 382, 10201, 10207, 10205, 10201, 3744, 10201, 1798, 10201,
12, 10202, 10207, 10201, 6188, 10201, 3671, 10201, 1459, 10201,
10207, 10207, 10201, 3288, 10201, 737, 10202, 10201, 4881, 10201,
1829, 10201, 6142, 10201, 1430, 10201, 10974, 10201, 1658, 10201,
3081, 10201, 5947, 10202, 10201, 1488, 10201, 3288, 10202, 10201,
4564, 10201, 2540, 10201, 10974, 10201, 3052, 10201, 382, 10201,
2417, 10201, 1143, 10201, 10207, 10205, 10201, 3744, 10201, 1798,
10201, 1017, 10202, 10207, 10201, 3061, 10201, 1078, 10201, 1520,
10201, 367, 10201, 670, 10201, 10207, 10207, 10201, 3288, 10201,
737, 10202, 10201, 5034, 10201, 2454, 10201, 4433, 10201, 1216,
10201, 10974, 10201, 5947, 10201, 591, 10201, 2500, 10201, 6142,
10201, 1253, 10202, 10201, 1488, 10201, 3288, 10202, 10201, 5493,
10201, 1289, 10201, 2417, 10201, 10974, 10201, 591, 10201, 2720,
10201, 6461, 10201, 10207, 10205, 10201, 3744, 10201, 1798, 10201,
90, 10202, 10207, 10201, 6431, 10201, 437, 10201, 5920, 10201,
10207, 10207, 10201, 3288, 10201, 737, 10202, 10201, 5088, 10201,
1289, 10201, 10974, 10201, 3001, 10201, 5908, 10201, 6585, 10202,
10201, 1488, 10201, 3288, 10202, 10201, 5493, 10201, 1289, 10201,
2417, 10201, 10974, 10201, 591, 10201, 2720, 10201, 6461, 10201,
10207, 10205, 10201, 3744, 10201, 1798, 10201, 380, 10202, 10207,
10201, 585, 10201, 2419, 10201, 379, 10201, 1027, 10202, 10201,
1220, 10201, 3744, 10201, 1798, 10201, 10207, 10207, 10201, 3288,
10201, 737, 10202, 10201, 6687, 10201, 3514, 10201, 10974, 10201,
2417, 10201, 5975, 10202, 10201, 2078, 10201, 6687, 10201, 2417,
10201, 10974, 10201, 4881, 10201, 1907, 10201, 2686, 10201, 3683,
10202, 10201, 1488, 10201, 3288, 10202, 10201, 3514, 10201, 6501,
10201, 10974, 10201, 1667, 10201, 367, 10201, 10207]],
device='cuda:0', dtype=torch.int32)
text_tokens shape: torch.Size([1, 408]), text_tokens type: torch.int32
text_token_syms is same as sentence tokens True
Traceback (most recent call last):
File "/tsdata3/wwj/index-tts/indextts/infer.py", line 567, in <module>
tts.infer(audio_prompt=prompt_wav, text=text, output_path='output/gen.wav', verbose=True)
File "/tsdata3/wwj/index-tts/indextts/infer.py", line 474, in infer
codes = self.gpt.inference_speech(auto_conditioning, text_tokens,
File "/tsdata3/wwj/index-tts/indextts/gpt/model.py", line 599, in inference_speech
speech_conditioning_latent = self.get_conditioning(speech_conditioning_latent, cond_mel_lengths)
File "/tsdata3/wwj/index-tts/indextts/gpt/model.py", line 497, in get_conditioning
speech_conditioning_input, mask = self.conditioning_encoder(speech_conditioning_input.transpose(1, 2),
File "/work/conda3/envs/index-tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/work/conda3/envs/index-tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/tsdata3/wwj/index-tts/indextts/gpt/conformer_encoder.py", line 430, in forward
xs, chunk_masks, _, _ = layer(xs, chunk_masks, pos_emb, mask_pad)
File "/work/conda3/envs/index-tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/work/conda3/envs/index-tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/tsdata3/wwj/index-tts/indextts/gpt/conformer_encoder.py", line 306, in forward
x = residual + self.ff_scale * self.dropout(self.feed_forward(x))
File "/work/conda3/envs/index-tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/work/conda3/envs/index-tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/tsdata3/wwj/index-tts/indextts/gpt/conformer_encoder.py", line 53, in forward
return self.w_2(self.dropout(self.activation(self.w_1(xs))))
File "/work/conda3/envs/index-tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/work/conda3/envs/index-tts/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/work/conda3/envs/index-tts/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 117, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.