[index-tts]CUDA内存不足

9

启用fp16=True，句子短点。

wangfeng35

8

启用fp16=True，句子短点。

谢谢你的回复，通过你的建议，我尝试了运行

uv run webui.py --model_dir checkpoints --fp16

还是会打印相同的错误

workibear

6

具体日志如下

/index-tts main > uv run webui.py --model_dir checkpoints --fp16                                                                                                                        19:33:38
>> GPT weights restored from: checkpoints/gpt.pth
[2025-09-09 19:33:59,644] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-09-09 19:34:00,925] [INFO] [logging.py:107:log_dist] [Rank -1] [TorchCheckpointEngine] Initialized with serialization = False
GPT2InferenceModel has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you're using `trust_remote_code=True`, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.
[2025-09-09 19:34:00,931] [INFO] [logging.py:107:log_dist] [Rank -1] DeepSpeed info: version=0.17.1, git-hash=unknown, git-branch=unknown
[2025-09-09 19:34:00,931] [WARNING] [config_utils.py:70:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2025-09-09 19:34:00,931] [INFO] [logging.py:107:log_dist] [Rank -1] [TorchCheckpointEngine] Initialized with serialization = False
[2025-09-09 19:34:00,931] [INFO] [logging.py:107:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[2025-09-09 19:34:00,969] [INFO] [logging.py:107:log_dist] [Rank -1] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 1280, 'intermediate_size': 5120, 'heads': 20, 'num_hidden_layers': -1, 'dtype': torch.float16, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_mode': False, 'use_triton': False, 'triton_autotune': False, 'num_kv': -1, 'rope_theta': 10000, 'invert_mask': True}
W0909 19:34:00.982000 6163 .venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
W0909 19:34:00.982000 6163 .venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
ninja: no work to do.
Time to load transformer_inference op: 0.023575544357299805 seconds
W0909 19:34:01.017000 6163 .venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
W0909 19:34:01.017000 6163 .venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
ninja: no work to do.
>> Preload custom CUDA kernel for BigVGAN <module 'anti_alias_activation_cuda' from '/home/workibear/Desktop/Python/index-tts/indextts/BigVGAN/alias_free_activation/cuda/build/anti_alias_activation_cuda.so'>
>> semantic_codec weights restored from: /home/workibear/.cache/huggingface/hub/models--amphion--MaskGCT/snapshots/265c6cef07625665d0c28d2faafb1415562379dc/semantic_codec/model.safetensors
cfm loaded
length_regulator loaded
gpt_layer loaded
>> s2mel weights restored from: checkpoints/s2mel.pth
>> campplus_model weights restored from: /home/workibear/.cache/huggingface/hub/models--funasr--campplus/snapshots/fb71fe990cbf6031ae6987a2d76fe64f94377b7e/campplus_cn_common.bin
Loading weights from nvidia/bigvgan_v2_22khz_80band_256x
Removing weight norm...
>> bigvgan weights restored from: nvidia/bigvgan_v2_22khz_80band_256x
2025-09-09 19:34:12,276 WETEXT INFO found existing fst: /home/workibear/Desktop/Python/index-tts/indextts/utils/tagger_cache/zh_tn_tagger.fst
2025-09-09 19:34:12,277 WETEXT INFO                     /home/workibear/Desktop/Python/index-tts/indextts/utils/tagger_cache/zh_tn_verbalizer.fst
2025-09-09 19:34:12,277 WETEXT INFO skip building fst for zh_normalizer ...
2025-09-09 19:34:12,432 WETEXT INFO found existing fst: /home/workibear/Desktop/Python/index-tts/.venv/lib/python3.10/site-packages/tn/en_tn_tagger.fst
2025-09-09 19:34:12,432 WETEXT INFO                     /home/workibear/Desktop/Python/index-tts/.venv/lib/python3.10/site-packages/tn/en_tn_verbalizer.fst
2025-09-09 19:34:12,432 WETEXT INFO skip building fst for en_normalizer ...
>> TextNormalizer loaded
>> bpe model loaded from: checkpoints/bpe.model
* Running on local URL:  http://0.0.0.0:7860
* To create a public link, set `share=True` in `launch()`.
Emo control mode:0,vec:None
>> start inference...
Traceback (most recent call last):
  File "/home/workibear/Desktop/Python/index-tts/.venv/lib/python3.10/site-packages/gradio/queueing.py", line 667, in process_events
    response = await route_utils.call_process_api(
  File "/home/workibear/Desktop/Python/index-tts/.venv/lib/python3.10/site-packages/gradio/route_utils.py", line 349, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/workibear/Desktop/Python/index-tts/.venv/lib/python3.10/site-packages/gradio/blocks.py", line 2274, in process_api
    result = await self.call_function(
  File "/home/workibear/Desktop/Python/index-tts/.venv/lib/python3.10/site-packages/gradio/blocks.py", line 1781, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "/home/workibear/Desktop/Python/index-tts/.venv/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/workibear/Desktop/Python/index-tts/.venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2476, in run_sync_in_worker_thread
    return await future
  File "/home/workibear/Desktop/Python/index-tts/.venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 967, in run
    result = context.run(func, *args)
  File "/home/workibear/Desktop/Python/index-tts/.venv/lib/python3.10/site-packages/gradio/utils.py", line 915, in wrapper
    response = f(*args, **kwargs)
  File "/home/workibear/Desktop/Python/index-tts/webui.py", line 142, in gen_single
    output = tts.infer(spk_audio_prompt=prompt, text=text,
  File "/home/workibear/Desktop/Python/index-tts/indextts/infer_v2.py", line 340, in infer
    spk_cond_emb = self.get_emb(input_features, attention_mask)
  File "/home/workibear/Desktop/Python/index-tts/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/home/workibear/Desktop/Python/index-tts/indextts/infer_v2.py", line 197, in get_emb
    vq_emb = self.semantic_model(
  File "/home/workibear/Desktop/Python/index-tts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/workibear/Desktop/Python/index-tts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/workibear/Desktop/Python/index-tts/.venv/lib/python3.10/site-packages/transformers/models/wav2vec2_bert/modeling_wav2vec2_bert.py", line 1027, in forward
    encoder_outputs = self.encoder(
  File "/home/workibear/Desktop/Python/index-tts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/workibear/Desktop/Python/index-tts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/workibear/Desktop/Python/index-tts/.venv/lib/python3.10/site-packages/transformers/models/wav2vec2_bert/modeling_wav2vec2_bert.py", line 533, in forward
    layer_outputs = layer(
  File "/home/workibear/Desktop/Python/index-tts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/workibear/Desktop/Python/index-tts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/workibear/Desktop/Python/index-tts/.venv/lib/python3.10/site-packages/transformers/models/wav2vec2_bert/modeling_wav2vec2_bert.py", line 441, in forward
    hidden_states, attn_weigts = self.self_attn(
  File "/home/workibear/Desktop/Python/index-tts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/workibear/Desktop/Python/index-tts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/workibear/Desktop/Python/index-tts/.venv/lib/python3.10/site-packages/transformers/models/wav2vec2_bert/modeling_wav2vec2_bert.py", line 319, in forward
    scores = scores + (relative_position_attn_weights / math.sqrt(self.head_size))
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 714.00 MiB. GPU 0 has a total capacity of 11.60 GiB of which 253.75 MiB is free. Including non-PyTorch memory, this process has 11.33 GiB memory in use. Of the allocated memory 10.44 GiB is allocated by PyTorch, and 684.31 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

workibear

3

尝试一下：

uv run python webui.py --fp16 --gui_seg_tokens 80

解释：

fp16 = 需要较少的 VRAM。

"gui_seg_tokens" = 更改每个音频片段生成的词标记数量。越大 = 所需 VRAM 越多，越小 = 所需 VRAM 越少。默认值为 120（可生成最自然的语音块），但您可以尝试 20-600 之间的任意值，看看哪个适合您的 GPU。您也可以在网页界面的高级设置面板中实时更改此设置。但请注意：出现 OOM 错误后，网页界面连接将出现错误，直到您重新启动后端。

Arcitec

[index-tts]CUDA内存不足

回答