[index-tts]Nvidia DGX-SPARK配置生成的语音充满杂音

有没有在最新的DGX-SPARK上面部署？我能成功的inference但是生成的音频充满杂音。我之前部署在mac mini上面v1.5，生成的语音没有任何问题，很干净。v1.5的版本照搬到DGX-SPARK上面后生成音频基本不可用。V2.0生成的也是同样的问题。

zdai

反复尝试后发现，如果我用torch 2.8 版本，torch.cuda.is_available() == False, 只能用CPU生成，结果语音没有杂音如果我用torch 2.9的CPU only版本，就算是CPU生成，结果音频也是充满噪音，用torch 2.9+cu130版本，使用GPU生成，结果音频充满杂音。所以看起来像是torcuaudio 2.9的问题

zdai

"我建议您改用： https://github.com/mirbehnam/Chatterbox-TTS-Server-windows-easyInstallation.git （如您所提），根据我的经验，这个方案更优秀、更稳定。这个系统能满足您的期望：

最佳音质：它能提供清晰（clean）的音频输出，就像您在 Mac mini 上使用 v1.5 所获得的那样。
情感控制 (Emotion Control)：您可以轻松控制语音的情感表达（如愤怒、悲伤、快乐等）。
多语言支持：它支持包括 23 种语言在内的多种语言。请尝试使用它，它将对您的工作更有效。"

nzgnzg73

DGX Test:

udo docker run --rm -it --gpus all \ --ipc=host \ --ulimit memlock=-1 \ --ulimit stack=67108864 \ nvidia/cuda:12.9.0-devel-ubuntu22.04 \ bash -c " apt-get update && \ apt-get install -y python3 python3-pip && \ echo '=== Installing PyTorch with dependencies ===' && \ pip3 install --pre torch --index-url https://download.pytorch.org/whl/test/cu129 && \ echo '=== Testing GB10 with PyTorch ===' && \ python3 -c \" import torch print('🎉 SUCCESS: PyTorch installed with dependencies!') print('PyTorch version:', torch.version) print('CUDA available:', torch.cuda.is_available()) if torch.cuda.is_available(): print('CUDA version:', torch.version.cuda) print('GPU Name:', torch.cuda.get_device_name(0)) capability = torch.cuda.get_devicecapability(0) print('Compute Capability:', f'sm{capability[0]}{capability[1]}') print('GPU Memory:', f'{torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB')

# Test GPU operations
print('Testing GPU operations...')
x = torch.randn(1000, 1000).cuda()
y = torch.randn(1000, 1000).cuda()
z = torch.matmul(x, y)
print('✅ GPU matrix multiplication successful!')
print('Result shape:', z.shape)
print('🎯 GB10 is fully operational with PyTorch + CUDA 12.9!')

else: print('❌ CUDA not available') \""

result:

=== Testing GB10 with PyTorch === /usr/local/lib/python3.10/dist-packages/torch/_subclasses/functional_tensor.py:279: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:84.) cpu = _conversion_method_template(device=torch.device("cpu")) 🎉 SUCCESS: PyTorch installed with dependencies! PyTorch version: 2.9.0+cu129 CUDA available: True CUDA version: 12.9 /usr/local/lib/python3.10/dist-packages/torch/cuda/init.py:283: UserWarning: Found GPU0 NVIDIA GB10 which is of cuda capability 12.1. Minimum and Maximum cuda capability supported by this version of PyTorch is (8.0) - (12.0)

warnings.warn( GPU Name: NVIDIA GB10 Compute Capability: sm_121 GPU Memory: 119.7 GB Testing GPU operations... ✅ GPU matrix multiplication successful! Result shape: torch.Size([1000, 1000]) 🎯 GB10 is fully operational with PyTorch + CUDA 12.9!

jjmlovesgit

我对即将出现的问题不太了解，但有人已经解决了，比这更好，非常好。请通过Gmail接收，我会详细解释给你，邮箱地址：nzgnzg73@gmail.com

nzgnzg73

sudo docker run --rm -it --gpus all -p 7860:7860 \ --ipc=host \ --ulimit memlock=-1 \ --ulimit stack=67108864 \ -v /home/mccormj/Applications/Indextts:/mnt/project \ -v /home/mccormj/.cache/huggingface:/root/.cache/huggingface \ nvidia/cuda:12.9.0-devel-ubuntu22.04 \ bash -c " cd /mnt/project && \ echo '=== Installing system dependencies ===' && \ apt-get update && apt-get install -y python3 python3-pip espeak-ng ffmpeg && \ echo '=== Installing PyTorch ecosystem ===' && \ pip3 install torch==2.8.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/test/cu129 && \ echo '=== Installing core dependencies from pyproject.toml ===' && \ pip3 install transformers==4.52.1 tokenizers==0.21.0 safetensors==0.5.2 accelerate==1.8.1 && \ echo '=== Installing additional required dependencies ===' && \ pip3 install einops cn2an==0.5.22 cython==3.0.7 descript-audiotools==0.7.2 ffmpeg-python==0.2.0 g2p-en==2.1.0 jieba==0.42.1 json5==0.10.0 keras==2.9.0 librosa==0.10.2.post1 matplotlib==3.8.2 modelscope==1.27.0 munch==4.0.0 numba==0.58.1 numpy==1.26.2 omegaconf opencv-python==4.9.0.80 pandas==2.3.2 sentencepiece tqdm textstat && \ echo '=== Installing webui dependencies ===' && \ pip3 install gradio==5.45.0 && \ echo '=== Installing deepspeed dependencies ===' && \ pip3 install deepspeed==0.17.1 && \ echo '=== Testing imports ===' && \ python3 -c \" try: import torch import torchaudio from transformers import version import einops import librosa import gradio import deepspeed print('✅ All imports successful') print('PyTorch version:', torch.version) print('Torchaudio version:', torchaudio.version) print('Transformers version:', version) print('CUDA available:', torch.cuda.is_available()) if torch.cuda.is_available(): print('GPU name:', torch.cuda.get_device_name(0)) print('CUDA version:', torch.version.cuda) except Exception as e: print('❌ Import failed:', e) import traceback traceback.print_exc() \" && \ echo '=== Starting IndexTTS ===' && \ CUDA_VISIBLE_DEVICES=0 python3 webui.py --model_dir /mnt/project/checkpoints --fp16 --deepspeed"

Results:

✅ All imports successful PyTorch version: 2.8.0+cu129 Torchaudio version: 2.8.0 Transformers version: 4.52.1 CUDA available: True GPU name: NVIDIA GB10 CUDA version: 12.9 === Starting IndexTTS ===

A Run:

starting inference... Use the specified emotion vector Free memory : 47.664494 (GigaBytes)
Total memory: 119.699211 (GigaBytes)
Requested memory: 0.541992 (GigaBytes) Setting maximum total tokens (input + output) to 1024 WorkSpace: 0xf1d7e0000000
Passing a tuple of past_key_values is deprecated and will be removed in Transformers v4.53.0. You should pass an instance of Cache instead, e.g. past_key_values=DynamicCache.from_legacy_cache(past_key_values). 100%|███████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 34.64it/s] torch.Size([1, 64256]) gpt_gen_time: 3.59 seconds gpt_forward_time: 0.01 seconds s2mel_time: 0.73 seconds bigvgan_time: 0.35 seconds Total inference time: 7.35 seconds Generated audio length: 2.91 seconds RTF: 2.5237 wav file saved to: outputs/spk_1760873369.wav

jjmlovesgit

jjmlovesgit

Sentence level creation for streaming...

Generating 1/7: The Visual Studio Code Dev Containers extension lets you use...

starting inference... Use the specified emotion vector 100%|█████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:01<00:00, 21.24it/s] torch.Size([1, 170752]) gpt_gen_time: 7.06 seconds gpt_forward_time: 0.01 seconds s2mel_time: 1.20 seconds bigvgan_time: 0.84 seconds Total inference time: 9.33 seconds Generated audio length: 7.74 seconds RTF: 1.2049 wav file saved to: outputs/stream_1760880134_0.wav 📝 Generating 2/7: It allows you to open any folder inside (or mounted into) a ... starting inference... Use the specified emotion vector 100%|█████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:01<00:00, 19.95it/s] torch.Size([1, 185344]) gpt_gen_time: 7.85 seconds gpt_forward_time: 0.01 seconds s2mel_time: 1.28 seconds bigvgan_time: 0.90 seconds Total inference time: 10.27 seconds Generated audio length: 8.41 seconds RTF: 1.2218 wav file saved to: outputs/stream_1760880144_1.wav 📝 Generating 3/7: A devcontainer.json file in your project tells VS Code how t... starting inference... Use the specified emotion vector 100%|█████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:01<00:00, 18.64it/s] torch.Size([1, 210688]) gpt_gen_time: 9.09 seconds gpt_forward_time: 0.01 seconds s2mel_time: 1.37 seconds bigvgan_time: 1.00 seconds Total inference time: 11.74 seconds Generated audio length: 9.56 seconds RTF: 1.2286 wav file saved to: outputs/stream_1760880154_2.wav 📝 Generating 4/7: This container can be used to run an application or to separ... starting inference... Use the specified emotion vector 100%|█████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:01<00:00, 21.41it/s] torch.Size([1, 166400]) gpt_gen_time: 6.92 seconds gpt_forward_time: 0.01 seconds s2mel_time: 1.19 seconds bigvgan_time: 0.81 seconds Total inference time: 9.15 seconds Generated audio length: 7.55 seconds RTF: 1.2121 wav file saved to: outputs/stream_1760880166_3.wav 📝 Generating 5/7: Workspace files are mounted from the local file system or co... starting inference... Use the specified emotion vector 100%|█████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:01<00:00, 23.94it/s] torch.Size([1, 132864]) gpt_gen_time: 5.31 seconds gpt_forward_time: 0.01 seconds s2mel_time: 1.07 seconds

jjmlovesgit

很不错

nzgnzg73

朋友，你能帮帮我吗？我消化能力没那么强。电脑配置低。你的语音通话器刚打开，你该怎么办？能给我它的网址分享一下吗？意思是公共网址是狗本地的公共网址，有什么用处吗？

nzgnzg73

will you help?

nzgnzg73

Quick 10 sec DGX screen recording to share actual output...

fmpeg -f x11grab -video_size 1255x1252 -i :1+2184,136 -f pulse -i default -t 10 firefox_recording.mp4

https://github.com/user-attachments/assets/dffa7cc5-2865-4bb0-914b-9fcf1546cdf1

jjmlovesgit

我不明白你的意思，请解释一下。

nzgnzg73

NVIDIA Quadro P2000 采用 Pascal SM 架构（流式多处理器设计），计算能力为 6.1。由于 PyTorch 不再附带支持此旧计算能力的二进制文件，因此很抱歉，我不知道如何让 Index2-tts 在该 GPU 上运行。

jjmlovesgit

IndexTTS 在 GPU 上的主要经验教训：

CUDA 和 PyTorch 兼容性

将 CUDA 12.9.0 与 PyTorch 2.8.0（test/cu129 通道）结合使用

关键：CUDA 版本与 PyTorch 版本完全匹配

命令：pip3 install torch==2.8.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/test/cu129

依赖管理

按正确顺序安装：系统依赖 → PyTorch → Core ML → 音频处理 → UI

必备系统包：python3 python3-pip espeak-ng ffmpeg

核心 ML 堆栈：transformers==4.52.1 tokenizers==0.21.0 safetensors==0.5.2 accelerate==1.8.1

音频处理：librosa==0.10.2.post1 soundfile numba==0.58.1

网页界面：gradio==5.45.0

jjmlovesgit

[index-tts]Nvidia DGX-SPARK配置生成的语音充满杂音

回答