What is the best way to contribute Benchmark information?

### Checks

- [x] This template is only for research question, not usage problems, feature requests or bug reports.
- [x] I have thoroughly reviewed the project documentation and read the related paper(s).
- [x] I have searched for existing issues, including closed ones, no similar questions.
- [x] I am using English to submit this issue to facilitate community communication.

### Question details

Hello! I want to contribute to my research. I don't want to change any code or add any features; I want to show some statistics with the same setup (RTX 5090) to show people real RTF and limits they can achieve with F5 (At least at the current moment of my progress). **I will use the TTFS term as (Time to First Sound) instead of TTFS as (Time To Final Segment)** because I implemented chunked streaming, and my final segment is basically the first chunk.

I want to contribute about:

- FP16 vs FP32 (Quality / Latency)
- preprocess and decode overhead
- Rust-based deployment and comparison
- Rust vs Python ORT delta
- benchmarks chunked streaming
- runtime vs schedule optimisation breakdown
- duration formula side-effect (Slowmode + Artefacts vs Last word cut)

One of the examples that I would like to provide:

| Stage | ours-onnx (Rust) | onnx-dakeqq (Python, IO Binding) | pytorch (FP16) | Notes |
|---|---|---|---|---|
| Preprocess | 10ms | 42ms | 1ms | Rust vs Python ORT overhead; PyTorch has no separate preprocess |
| Transformer | 266ms | 290ms | 297ms | IO Binding vs IO Binding vs native PyTorch (sway sampling) |
| Decode | 3ms | 17ms | 2ms | Vocos; PyTorch runs decode in-process |
| **Total** | **280ms** | **350ms** | **299ms** | |
| per step | 16.6ms | 18.1ms | 18.5ms | Rust IO Binding + custom ORT wins per-step |
| Output duration | 7.97s | 7.94s | 7.95s | Forced equal via fixed mel frames |
| RTF | 0.035 | 0.044 | 0.038 | |
| Steps | 16 (EPSS) | 16 (EPSS) | 16 (sway) | |

I could also provide .wav file examples with the same voice/seed/text to compare.

The question is: What is the best way for me to provide such info? Some pull request? Or Discussion?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the best way to contribute Benchmark information? #1286

Checks

Question details

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Stage	ours-onnx (Rust)	onnx-dakeqq (Python, IO Binding)	pytorch (FP16)	Notes
Preprocess	10ms	42ms	1ms	Rust vs Python ORT overhead; PyTorch has no separate preprocess
Transformer	266ms	290ms	297ms	IO Binding vs IO Binding vs native PyTorch (sway sampling)
Decode	3ms	17ms	2ms	Vocos; PyTorch runs decode in-process
Total	280ms	350ms	299ms
per step	16.6ms	18.1ms	18.5ms	Rust IO Binding + custom ORT wins per-step
Output duration	7.97s	7.94s	7.95s	Forced equal via fixed mel frames
RTF	0.035	0.044	0.038
Steps	16 (EPSS)	16 (EPSS)	16 (sway)

What is the best way to contribute Benchmark information? #1286

Description

Checks

Question details

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions