400k steps into training, still heavy halucinating/unintelligible

### Checks

- [x] This template is only for usage issues encountered.
- [x] I have thoroughly reviewed the project documentation but couldn't find information to solve my problem.
- [x] I have searched for existing issues, including closed ones, and couldn't find a solution.
- [x] I am using English to submit this issue to facilitate community communication.

### Environment Details

runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04

### Steps to Reproduce

This is the training config that I use with my dataset of 130 hours of clean Serbian speech:
```
  exp_name:               F5TTS_v1_Base
  tokenizer:              char
  mixed_precision:        bf16
  learning_rate:          7.5e-05
  batch_size_per_gpu:     20189
  batch_size_type:        frame
  max_samples:            64
  grad_accumulation_steps: 1
  max_grad_norm:          1
  epochs:                 434
  num_warmup_updates:     3779
  save_per_updates:       5000
  keep_last_n_checkpoints: 1
  last_per_updates:       10000
  logger:                 tensorboard
  dataset:                serbian
  finetune:               false (training from scratch)
  dataset_size:           60,948 samples / 132.05 hours
  gpu:                    NVIDIA A40 (46GB)
```
after 3 days of training and ~400k steps, inferenced audio is still halucinating and repeating some parts of the word or the whole words, sometimes also unintelligible.

Loss curve
![loss curve](https://i.imgur.com/jgpIUgR.png)

Learninig rate
![learning rate](https://i.imgur.com/l2w2Q7x.png)

### ✔️ Expected Behavior

referenced audio: https://voca.ro/1iSJaiUm5CHz
referenced text: u tom komitetu dobijamo vrlo vrlo opširne biografije kandidata, sa kojima vodimo razgovor i biramo ih, čak i ispitujemo.

### ❌ Actual Behavior

inferenced text: (same as referenced text)
inferenced audio: https://voca.ro/1mT0JkcugloJ
(this is with EMA enabled, without EMA is much worse)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

400k steps into training, still heavy halucinating/unintelligible #1283

Checks

Environment Details

Steps to Reproduce

✔️ Expected Behavior

❌ Actual Behavior

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

400k steps into training, still heavy halucinating/unintelligible #1283

Description

Checks

Environment Details

Steps to Reproduce

✔️ Expected Behavior

❌ Actual Behavior

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions