有没有在最新的DGX-SPARK上面部署?我能成功的inference但是生成的音频充满杂音。 我之前部署在mac mini上面v1.5,生成的语音没有任何问题,很干净。v1.5的版本照搬到DGX-SPARK上面后生成音频基本不可用。V2.0生成的也是同样的问题。
[index-tts]Nvidia DGX-SPARK配置生成的语音充满杂音
回答
反复尝试后发现,如果我用torch 2.8 版本,torch.cuda.is_available() == False, 只能用CPU生成,结果语音没有杂音 如果我用torch 2.9的CPU only版本,就算是CPU生成,结果音频也是充满噪音,用torch 2.9+cu130版本,使用GPU生成,结果音频充满杂音。所以看起来像是torcuaudio 2.9的问题
"我建议您改用: https://github.com/mirbehnam/Chatterbox-TTS-Server-windows-easyInstallation.git (如您所提),根据我的经验,这个方案更优秀、更稳定。 这个系统能满足您的期望:
- 最佳音质: 它能提供清晰(clean)的音频输出,就像您在 Mac mini 上使用 v1.5 所获得的那样。
- 情感控制 (Emotion Control): 您可以轻松控制语音的情感表达(如愤怒、悲伤、快乐等)。
- 多语言支持: 它支持包括 23 种语言在内的多种语言。 请尝试使用它,它将对您的工作更有效。"
DGX Test:
udo docker run --rm -it --gpus all \ --ipc=host \ --ulimit memlock=-1 \ --ulimit stack=67108864 \ nvidia/cuda:12.9.0-devel-ubuntu22.04 \ bash -c " apt-get update && \ apt-get install -y python3 python3-pip && \ echo '=== Installing PyTorch with dependencies ===' && \ pip3 install --pre torch --index-url https://download.pytorch.org/whl/test/cu129 && \ echo '=== Testing GB10 with PyTorch ===' && \ python3 -c \" import torch print('🎉 SUCCESS: PyTorch installed with dependencies!') print('PyTorch version:', torch.version) print('CUDA available:', torch.cuda.is_available()) if torch.cuda.is_available(): print('CUDA version:', torch.version.cuda) print('GPU Name:', torch.cuda.get_device_name(0)) capability = torch.cuda.get_devicecapability(0) print('Compute Capability:', f'sm{capability[0]}{capability[1]}') print('GPU Memory:', f'{torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB')
# Test GPU operations
print('Testing GPU operations...')
x = torch.randn(1000, 1000).cuda()
y = torch.randn(1000, 1000).cuda()
z = torch.matmul(x, y)
print('✅ GPU matrix multiplication successful!')
print('Result shape:', z.shape)
print('🎯 GB10 is fully operational with PyTorch + CUDA 12.9!')else: print('❌ CUDA not available') \""
result:
=== Testing GB10 with PyTorch === /usr/local/lib/python3.10/dist-packages/torch/_subclasses/functional_tensor.py:279: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:84.) cpu = _conversion_method_template(device=torch.device("cpu")) 🎉 SUCCESS: PyTorch installed with dependencies! PyTorch version: 2.9.0+cu129 CUDA available: True CUDA version: 12.9 /usr/local/lib/python3.10/dist-packages/torch/cuda/init.py:283: UserWarning: Found GPU0 NVIDIA GB10 which is of cuda capability 12.1. Minimum and Maximum cuda capability supported by this version of PyTorch is (8.0) - (12.0)
warnings.warn( GPU Name: NVIDIA GB10 Compute Capability: sm_121 GPU Memory: 119.7 GB Testing GPU operations... ✅ GPU matrix multiplication successful! Result shape: torch.Size([1000, 1000]) 🎯 GB10 is fully operational with PyTorch + CUDA 12.9!
我对即将出现的问题不太了解,但有人已经解决了,比这更好,非常好。请通过Gmail接收,我会详细解释给你,邮箱地址:nzgnzg73@gmail.com
sudo docker run --rm -it --gpus all -p 7860:7860 \ --ipc=host \ --ulimit memlock=-1 \ --ulimit stack=67108864 \ -v /home/mccormj/Applications/Indextts:/mnt/project \ -v /home/mccormj/.cache/huggingface:/root/.cache/huggingface \ nvidia/cuda:12.9.0-devel-ubuntu22.04 \ bash -c " cd /mnt/project && \ echo '=== Installing system dependencies ===' && \ apt-get update && apt-get install -y python3 python3-pip espeak-ng ffmpeg && \ echo '=== Installing PyTorch ecosystem ===' && \ pip3 install torch==2.8.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/test/cu129 && \ echo '=== Installing core dependencies from pyproject.toml ===' && \ pip3 install transformers==4.52.1 tokenizers==0.21.0 safetensors==0.5.2 accelerate==1.8.1 && \ echo '=== Installing additional required dependencies ===' && \ pip3 install einops cn2an==0.5.22 cython==3.0.7 descript-audiotools==0.7.2 ffmpeg-python==0.2.0 g2p-en==2.1.0 jieba==0.42.1 json5==0.10.0 keras==2.9.0 librosa==0.10.2.post1 matplotlib==3.8.2 modelscope==1.27.0 munch==4.0.0 numba==0.58.1 numpy==1.26.2 omegaconf opencv-python==4.9.0.80 pandas==2.3.2 sentencepiece tqdm textstat && \ echo '=== Installing webui dependencies ===' && \ pip3 install gradio==5.45.0 && \ echo '=== Installing deepspeed dependencies ===' && \ pip3 install deepspeed==0.17.1 && \ echo '=== Testing imports ===' && \ python3 -c \" try: import torch import torchaudio from transformers import version import einops import librosa import gradio import deepspeed print('✅ All imports successful') print('PyTorch version:', torch.version) print('Torchaudio version:', torchaudio.version) print('Transformers version:', version) print('CUDA available:', torch.cuda.is_available()) if torch.cuda.is_available(): print('GPU name:', torch.cuda.get_device_name(0)) print('CUDA version:', torch.version.cuda) except Exception as e: print('❌ Import failed:', e) import traceback traceback.print_exc() \" && \ echo '=== Starting IndexTTS ===' && \ CUDA_VISIBLE_DEVICES=0 python3 webui.py --model_dir /mnt/project/checkpoints --fp16 --deepspeed"
Results:
✅ All imports successful PyTorch version: 2.8.0+cu129 Torchaudio version: 2.8.0 Transformers version: 4.52.1 CUDA available: True GPU name: NVIDIA GB10 CUDA version: 12.9 === Starting IndexTTS ===
A Run:
starting inference... Use the specified emotion vector Free memory : 47.664494 (GigaBytes)
Total memory: 119.699211 (GigaBytes)
Requested memory: 0.541992 (GigaBytes) Setting maximum total tokens (input + output) to 1024 WorkSpace: 0xf1d7e0000000Passing a tuple of
past_key_valuesis deprecated and will be removed in Transformers v4.53.0. You should pass an instance ofCacheinstead, e.g.past_key_values=DynamicCache.from_legacy_cache(past_key_values). 100%|███████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 34.64it/s] torch.Size([1, 64256]) gpt_gen_time: 3.59 seconds gpt_forward_time: 0.01 seconds s2mel_time: 0.73 seconds bigvgan_time: 0.35 seconds Total inference time: 7.35 seconds Generated audio length: 2.91 seconds RTF: 2.5237 wav file saved to: outputs/spk_1760873369.wav
Sentence level creation for streaming...
Generating 1/7: The Visual Studio Code Dev Containers extension lets you use...
starting inference... Use the specified emotion vector 100%|█████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:01<00:00, 21.24it/s] torch.Size([1, 170752]) gpt_gen_time: 7.06 seconds gpt_forward_time: 0.01 seconds s2mel_time: 1.20 seconds bigvgan_time: 0.84 seconds Total inference time: 9.33 seconds Generated audio length: 7.74 seconds RTF: 1.2049 wav file saved to: outputs/stream_1760880134_0.wav 📝 Generating 2/7: It allows you to open any folder inside (or mounted into) a ... starting inference... Use the specified emotion vector 100%|█████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:01<00:00, 19.95it/s] torch.Size([1, 185344]) gpt_gen_time: 7.85 seconds gpt_forward_time: 0.01 seconds s2mel_time: 1.28 seconds bigvgan_time: 0.90 seconds Total inference time: 10.27 seconds Generated audio length: 8.41 seconds RTF: 1.2218 wav file saved to: outputs/stream_1760880144_1.wav 📝 Generating 3/7: A devcontainer.json file in your project tells VS Code how t... starting inference... Use the specified emotion vector 100%|█████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:01<00:00, 18.64it/s] torch.Size([1, 210688]) gpt_gen_time: 9.09 seconds gpt_forward_time: 0.01 seconds s2mel_time: 1.37 seconds bigvgan_time: 1.00 seconds Total inference time: 11.74 seconds Generated audio length: 9.56 seconds RTF: 1.2286 wav file saved to: outputs/stream_1760880154_2.wav 📝 Generating 4/7: This container can be used to run an application or to separ... starting inference... Use the specified emotion vector 100%|█████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:01<00:00, 21.41it/s] torch.Size([1, 166400]) gpt_gen_time: 6.92 seconds gpt_forward_time: 0.01 seconds s2mel_time: 1.19 seconds bigvgan_time: 0.81 seconds Total inference time: 9.15 seconds Generated audio length: 7.55 seconds RTF: 1.2121 wav file saved to: outputs/stream_1760880166_3.wav 📝 Generating 5/7: Workspace files are mounted from the local file system or co... starting inference... Use the specified emotion vector 100%|█████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:01<00:00, 23.94it/s] torch.Size([1, 132864]) gpt_gen_time: 5.31 seconds gpt_forward_time: 0.01 seconds s2mel_time: 1.07 seconds
很不错
朋友,你能帮帮我吗?我消化能力没那么强。电脑配置低。你的语音通话器刚打开,你该怎么办?能给我它的网址分享一下吗?意思是公共网址是狗本地的公共网址,有什么用处吗?
will you help? 
Quick 10 sec DGX screen recording to share actual output...
fmpeg -f x11grab -video_size 1255x1252 -i :1+2184,136 -f pulse -i default -t 10 firefox_recording.mp4
https://github.com/user-attachments/assets/dffa7cc5-2865-4bb0-914b-9fcf1546cdf1
我不明白你的意思,请解释一下。
NVIDIA Quadro P2000 采用 Pascal SM 架构(流式多处理器设计),计算能力为 6.1。由于 PyTorch 不再附带支持此旧计算能力的二进制文件,因此很抱歉,我不知道如何让 Index2-tts 在该 GPU 上运行。
IndexTTS 在 GPU 上的主要经验教训:
- 
CUDA 和 PyTorch 兼容性 将 CUDA 12.9.0 与 PyTorch 2.8.0(test/cu129 通道)结合使用 关键:CUDA 版本与 PyTorch 版本完全匹配 命令:pip3 install torch==2.8.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/test/cu129 
- 
依赖管理 按正确顺序安装:系统依赖 → PyTorch → Core ML → 音频处理 → UI 必备系统包:python3 python3-pip espeak-ng ffmpeg 核心 ML 堆栈:transformers==4.52.1 tokenizers==0.21.0 safetensors==0.5.2 accelerate==1.8.1 音频处理:librosa==0.10.2.post1 soundfile numba==0.58.1 网页界面:gradio==5.45.0