Hard-won lessons from building browser-based TTS/STT inference with ONNX Runtime.
When multiple packages depend on different versions of onnxruntime-common, webpack may resolve to the wrong one. In our case:
@xenova/transformers@2.17.2bringsonnxruntime-common@1.14.0kokoro-js@1.2.1→@huggingface/transformers@3.8.1→onnxruntime-web@1.22.0-dev→onnxruntime-common@1.22.0-dev
The v1.14.0 Tensor class stores data as this.data with no location getter. The v1.22.0-dev Tensor class stores data as this.cpuData with a get location() returning this.dataLocation. When onnxruntime-web's session handler calls tensor.location, the old Tensor returns undefined, causing:
Error: invalid data location: undefined for input 'input_ids'
Webpack resolve alias that dynamically walks the dependency chain to find the correct version:
// next.config.ts
const require_ = createRequire(import.meta.url);
const ortWebDir = path.dirname(
require_.resolve("onnxruntime-web", {
paths: [path.dirname(require_.resolve("@huggingface/transformers", {
paths: [path.dirname(require_.resolve("kokoro-js"))],
}))],
}),
);
const ortCommonPath = path.dirname(
require_.resolve("onnxruntime-common", { paths: [ortWebDir] }),
);
// Then in webpack config:
config.resolve.alias["onnxruntime-common"] = ortCommonPath;- pnpm overrides: Would force all packages to use one version, potentially breaking
@xenova/transformerswhich expects the old API. - Hardcoded pnpm store path: Breaks on
pnpm update, different machines, or CI — the store hash changes with each version. require.resolvefrom project root: pnpm's strict isolation resolves to v1.14.0 (most commonly depended version) rather than the one we need.
- Next.js 16 uses Turbopack by default. The
--webpackflag is required for custom webpack config (resolve aliases, experiments). - Removing
--turbopackis not enough — you must explicitly pass--webpack. - Turbopack does not support webpack
resolve.alias. If switching to Turbopack, useturbopack.resolveAliasin next config.
config.experiments = { ...config.experiments, asyncWebAssembly: true };
config.output = { ...config.output, globalObject: "self" }; // For web workersONNX packages must be excluded from server-side bundling to prevent Node.js/browser API conflicts:
serverExternalPackages: [
"onnxruntime-node", "onnxruntime-web", "onnxruntime-common",
"@huggingface/transformers", "kokoro-js"
]ONNX Runtime WASM uses SharedArrayBuffer for multi-threaded execution. Browsers require these headers:
Cross-Origin-Embedder-Policy: credentialless
Cross-Origin-Opener-Policy: same-origin
Using credentialless instead of require-corp allows loading external resources (HuggingFace CDN model files) without CORP headers.
Browser-based ML inference is long-running (1-15+ seconds). Without guards:
- Double-clicking "Download" can create duplicate model instances, leaking hundreds of MB of WASM memory.
- Rapid "Generate" clicks can queue overlapping inference calls on the same ONNX session.
Fix: Use useRef guards (not state — refs are synchronous):
const generatingRef = useRef(false);
const handleGenerate = useCallback(async () => {
if (generatingRef.current) return;
generatingRef.current = true;
try { /* ... */ } finally { generatingRef.current = false; }
}, []);URL.createObjectURL()creates session-scoped URLs. They do not survive page reloads.- Always revoke previous blob URLs before creating new ones.
- Add
useEffectcleanup to revoke on unmount. - Never persist blob URLs to
localStorage— they become invalid after reload. Use an in-memory cache for same-session playback.
When you see invalid data location: undefined, check which version of onnxruntime-common is actually loaded at runtime:
const ortCommon = await import("onnxruntime-common");
const src = ortCommon.Tensor.toString();
console.log("Has 'cpuData':", src.includes("cpuData")); // true = v1.22+
console.log("Has 'dataLocation':", src.includes("dataLocation")); // true = v1.22+If both are false, you have the old v1.14.0 Tensor — check your webpack alias.
| Input Length | Generation Time | Audio Duration | RTF |
|---|---|---|---|
| ~1 word | ~1.0s | ~1.3s | ~0.8x |
| ~14 words | ~4-5s | ~6s | ~0.7-0.8x |
| ~52 words | ~14s | ~18s | ~0.76x |
RTF (Real-Time Factor) below 1.0x means faster-than-realtime generation.