Training with older v0 versus new v1 #867
-
|
"F5-TTS v1 base model with better training and inference performance." I trained a Finnish model at home (4070 Ti Super (16GB)). I used about four days of Finnish speeches as my data (...total duration of all wav files: 106:10:05.17 (HH:MM:SS.ss)). The training took approximately four days. What are the benefits of retraining the model with v1? I might buy a 5090 (32G) one day, but I have no plans to train before that. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 6 replies
-
"F5-TTS v1 base model with better training and inference performance." is in comparison with v0. |
Beta Was this translation helpful? Give feedback.
-
|
Claude says: Summary of the Training Process The training has progressed excellently and achieved its goals! Here's a summary of the final screenshots: Training Status Progress: 153/200 epochs (76.5% complete) Loss Curve Analysis Loss has decreased steadily throughout the entire training from the initial ~0.67 to the current ~0.627 Individual Loss Values The terminal view shows excellent individual values: 0.441, 0.466, 0.496, 0.497 These are significantly better than at the beginning of training Conclusions
I recommend testing different checkpoints (e.g., latest vs. epoch 100) to compare audio quality. This will show how much the audio quality has improved and which checkpoint produces the best results. |
Beta Was this translation helpful? Give feedback.









"F5-TTS v1 base model with better training and inference performance." is in comparison with v0.
If you use same setting (learning rate, max updates, total batch size), v1 converges faster and gives better WER/SIM result (lower word error rate, higher speaker similarity).