Skip to content

400k steps into training, still heavy halucinating/unintelligible #1283

Description

@DedaDev

Checks

  • This template is only for usage issues encountered.
  • I have thoroughly reviewed the project documentation but couldn't find information to solve my problem.
  • I have searched for existing issues, including closed ones, and couldn't find a solution.
  • I am using English to submit this issue to facilitate community communication.

Environment Details

runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04

Steps to Reproduce

This is the training config that I use with my dataset of 130 hours of clean Serbian speech:

  exp_name:               F5TTS_v1_Base
  tokenizer:              char
  mixed_precision:        bf16
  learning_rate:          7.5e-05
  batch_size_per_gpu:     20189
  batch_size_type:        frame
  max_samples:            64
  grad_accumulation_steps: 1
  max_grad_norm:          1
  epochs:                 434
  num_warmup_updates:     3779
  save_per_updates:       5000
  keep_last_n_checkpoints: 1
  last_per_updates:       10000
  logger:                 tensorboard
  dataset:                serbian
  finetune:               false (training from scratch)
  dataset_size:           60,948 samples / 132.05 hours
  gpu:                    NVIDIA A40 (46GB)

after 3 days of training and ~400k steps, inferenced audio is still halucinating and repeating some parts of the word or the whole words, sometimes also unintelligible.

Loss curve
loss curve

Learninig rate
learning rate

✔️ Expected Behavior

referenced audio: https://voca.ro/1iSJaiUm5CHz
referenced text: u tom komitetu dobijamo vrlo vrlo opširne biografije kandidata, sa kojima vodimo razgovor i biramo ih, čak i ispitujemo.

❌ Actual Behavior

inferenced text: (same as referenced text)
inferenced audio: https://voca.ro/1mT0JkcugloJ
(this is with EMA enabled, without EMA is much worse)

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions