[2noise/ChatTTS]使用自己的音频文件克隆音色,生成新的声音文件时报错 “_lzma.LZMAError: Corrupt input data”

2025-11-10 833 views
6

使用自己的音频文件(约6秒,标准英式native speaker发音, 正常语速)克隆音色,生成新的声音文件时报错 “_lzma.LZMAError: Corrupt input data” 请参见如下 "运行结果" 和 "测试代码"

运行结果
no GPU or NPU found, use CPU instead

found invalid characters: {'1', '0', '-'}
text:   0%|▏                                                                            | 1/384(max) [00:00,  3.05it/s]`get_max_cache()` is deprecated for all Cache classes. Use `get_max_cache_shape()` instead. Calling `get_max_cache()` will raise error from v4.48
text:  10%|███████▋                                                                    | 39/384(max) [00:04,  9.69it/s]
code:  13%|█████████▌                                                                | 265/2048(max) [00:21, 12.32it/s]
text:   8%|██████▎                                                                     | 32/384(max) [00:03,  8.12it/s]
Traceback (most recent call last):
  File "C:\code\projs\ChatTTS\test_with_upload.py", line 41, in <module>
    wavs = chat.infer(
           ^^^^^^^^^^^
  File "C:\code\projs\ChatTTS\ChatTTS\core.py", line 261, in infer
    for wavs in res_gen:
  File "C:\code\projs\ChatTTS\ChatTTS\core.py", line 436, in _infer
    self._infer_code(
  File "C:\Programs\Anaconda\envs\chattts\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\code\projs\ChatTTS\ChatTTS\core.py", line 631, in _infer_code
    self.speaker.apply(
  File "C:\Programs\Anaconda\envs\chattts\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\code\projs\ChatTTS\ChatTTS\model\speaker.py", line 32, in apply
    spk_emb_tensor = torch.from_numpy(self._decode(spk_emb))
                                      ^^^^^^^^^^^^^^^^^^^^^
  File "C:\code\projs\ChatTTS\ChatTTS\model\speaker.py", line 148, in _decode
    lzma.decompress(
  File "C:\Programs\Anaconda\envs\chattts\Lib\lzma.py", line 343, in decompress
    res = decomp.decompress(data)
          ^^^^^^^^^^^^^^^^^^^^^^^
_lzma.LZMAError: Corrupt input data
测试代码
import ChatTTS
import torch
import scipy
from typing import Optional
import torchaudio
from tools.audio import load_audio

chat = ChatTTS.Chat()
chat.load(compile=False)

def on_upload_sample(sample_audio_input: Optional[str]) -> str:
    sample_audio = torch.tensor(load_audio(sample_audio_input, 24000)).to('cpu')
    spk_smp = chat.sample_audio_speaker(sample_audio)
    del sample_audio
    return spk_smp

spk_smb = on_upload_sample(r"C:\Users\admin\Desktop\2.wav")

# 这里的文字就是音频 2.wav 中的文字
smp_txt = ["Our eco-friendly packaging is made from 100 percent biodegradable materials, including recycled paper and plant"]
reftext = chat.infer(smp_txt, refine_text_only=False)

params_infer_code = ChatTTS.Chat.InferCodeParams(
    txt_smp=reftext,
    spk_emb=spk_smb,
    temperature=0.8,
    top_P=0.4, 
    top_K=7,
)

texts = ["The output indicates that Torchaudio has successfully detected the soundfile backend.", "This is a valid backend for audio processing,", "but it does not rely on Sox."]

wavs = chat.infer(
    texts,
    params_infer_code=params_infer_code,
)

torchaudio.save("3.wav", torch.from_numpy(wavs[0]).unsqueeze(0), 24000)

回答

8

音色传入spk_smp而非spk_emb

0

音色传入spk_smp而非spk_emb

regenerate in order to ensure non-empty
code:   0%|          | 0/2048(max) [00:00, ?it/s]unexpected end at index [1, 2]
code:   0%|          | 0/2048(max) [00:00, ?it/s]

按上面代码端尝试后,死循环了,一直报这个错

用webui 也是出错了。

Traceback (most recent call last):
  File "/Users/ken/PycharmProjects/ChatTTS/.venv/lib/python3.9/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "/Users/ken/PycharmProjects/ChatTTS/.venv/lib/python3.9/site-packages/gradio/route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
  File "/Users/ken/PycharmProjects/ChatTTS/.venv/lib/python3.9/site-packages/gradio/blocks.py", line 1935, in process_api
    result = await self.call_function(
  File "/Users/ken/PycharmProjects/ChatTTS/.venv/lib/python3.9/site-packages/gradio/blocks.py", line 1532, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/Users/ken/PycharmProjects/ChatTTS/.venv/lib/python3.9/site-packages/gradio/utils.py", line 671, in async_iteration
    return await iterator.__anext__()
  File "/Users/ken/PycharmProjects/ChatTTS/.venv/lib/python3.9/site-packages/gradio/utils.py", line 664, in __anext__
    return await anyio.to_thread.run_sync(
  File "/Users/ken/PycharmProjects/ChatTTS/.venv/lib/python3.9/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/Users/ken/PycharmProjects/ChatTTS/.venv/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2461, in run_sync_in_worker_thread
    return await future
  File "/Users/ken/PycharmProjects/ChatTTS/.venv/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 962, in run
    result = context.run(func, *args)
  File "/Users/ken/PycharmProjects/ChatTTS/.venv/lib/python3.9/site-packages/gradio/utils.py", line 647, in run_sync_iterator_async
    return next(iterator)
  File "/Users/ken/PycharmProjects/ChatTTS/.venv/lib/python3.9/site-packages/gradio/utils.py", line 809, in gen_wrapper
    response = next(iterator)
  File "/Users/ken/PycharmProjects/ChatTTS/examples/web/funcs.py", line 194, in generate_audio
    wav = chat.infer(
  File "/Users/ken/PycharmProjects/ChatTTS/ChatTTS/core.py", line 265, in infer
    return [np.concatenate(stripped_wavs)]
ValueError: need at least one array to concatenate

后台报错如上

3

@harryzy 碰到了一样的问题。 是不是下载的main branch代码?

2

@harryzy 碰到了一样的问题。 是不是下载的main branch代码?

解决了吗,我也遇到一样的问题了

0

Sample Text 必须严格符合 ChatTTS 的文本格式,不要出现中英混杂。具体可以搜索其它相关 issue。

2

Sample Text 必须严格符合 ChatTTS 的文本格式,不要出现中英混杂。具体可以搜索其它相关 issue。

我也遇到这个问题了,严格的ChatTTS 的文本格式是什么,就是MP3文件里面的说的话要固定的格式来说吗?