Releases: born-ml/born
v0.8.0 — Pure Go WebGPU (gogpu/wgpu)
Pure Go WebGPU — No Shared Libraries, No CGO
Born's GPU backend now uses gogpu/wgpu v0.26.8 — a pure Go WebGPU implementation. No more DLL/SO downloads. Just go build.
Highlights
- True single binary deployment — GPU support built into the executable
- Vulkan primary backend — stable on Windows, Linux, macOS
- Zero runtime dependencies — no wgpu-native, no shared libraries
- Validated — 105 GPU tests pass, real model training (20 epochs, 0 crashes)
What Changed
- Replaced
go-webgpu/webgpu(Rust FFI) withgogpu/wgpu(pure Go) - Fixed PipelineLayout lifetime for Vulkan SetBindGroup
- Fixed lazy ops buffer lifetime with immediate submit
- Fixed lazy chain data propagation
- Added
runtime.KeepAliveguards for GC safety - Added
Poll(PollWait)in Release for clean shutdown - Sign/Abs operators (#59 by @bennibbelink)
Breaking Changes
wgpu_nativeshared library no longer needed (or used)IsAvailable()now verifies compute shader support (not just adapter presence)
Full changelog
See CHANGELOG.md
v0.7.16
v0.7.16 — Community Contributions, ONNX 49 Operators, Bugfixes
Third external contributor @gmohmad with 5 PRs! Plus continued work from @bennibbelink.
Added
- ONNX
LayerNormalizationoperator (#47 by @gmohmad) BatchMatMulNumPy-style broadcasting — supports 2D×3D, singleton batch dims (#49 by @gmohmad)- ONNX comparison operators: Greater, GreaterOrEqual, Less, LessOrEqual (#56 by @gmohmad)
- ONNX logical operators: Not, And, Or, Xor (#56 by @gmohmad)
- ONNX Erf operator (#56 by @gmohmad)
- Broadcasting for boolean and comparison ops in CPU backend (#56 by @gmohmad)
tensor.BroadcastShapesMatMulpublic API- ONNX
AttributePrototensor attribute parsing (#53 by @gmohmad)
Fixed
- CPU backend: prevent inplace mutation when operands alias —
Mul(x,x)no longer corrupts input (#55, fixes #45) Squeezescalar handling: returnsShape{}(scalar) instead ofShape{1}(#50 by @gmohmad)- ONNX
AttributeProtoparser: correct protobuf field numbers, non-packed encoding support (#53 by @gmohmad) - CI: added
testgate job for branch protection (#52)
Refactored
ConvDims/PoolDimsparameter structs (#46 by @bennibbelink), moved to sharedinternal/tensor/package (#48)- 14 helper functions extracted from conv2d/maxpool2d inner loops (#17) — compiler-inlined, Conv2D batch path ~28% faster
- Resolved 86 gosec lint errors after golangci-lint v2.11.4 upgrade (#39)
Stats
- ONNX operators: 39 → 49
- PRs merged: 12
- Issues closed: 8 (#16, #17, #43, #44, #45, #48, #51, #54)
- Contributors: @gmohmad (5 PRs), @bennibbelink (1 PR)
Thank you! 🙏
A huge thanks to our contributors:
- @gmohmad — 5 PRs, 5 issues filed. Found real bugs (inplace aliasing, Squeeze scalar, AttributeProto parsing), added 10 new ONNX operators, and implemented broadcasting. Outstanding work!
- @bennibbelink — ConvDims/PoolDims refactoring that improved code quality across the conv2d/maxpool2d stack.
Community contributions make Born better. If you'd like to contribute, check our open issues!
Full Changelog: v0.7.15...v0.7.16
v0.7.15 — Erf Operator (Community Contribution)
Second External Contribution!
Thanks to @bennibbelink for a full vertical slice across the entire stack!
Added:
Erf(error function) operator — element-wise error function- Backend interface, CPU (
math.Erf), WebGPU (Abramowitz & Stegun shader) - Autodiff with correct backward pass:
2/√π · exp(-x²) - Mock backend, Tensor API (
tensor.Erf()) - Comprehensive tests: forward + backward, float32/float64, edge cases (Inf, NaN)
Links:
- PR: #37 by @bennibbelink
go get github.com/born-ml/born@v0.7.15v0.7.14 — ONNX Equal Operator (Community Contribution)
First External Contribution!
Thanks to @jsully1720 for the first community PR!
Added:
- ONNX
Equaloperator — binary element-wise comparison returning bool tensor - New
comparison_ops.gocategory for ONNX comparison operators registerComparisonOps()wired into operator registry
ONNX operators: 38 → 39
Links:
- PR: #34 by @jsully1720
- Issue: #35
go get github.com/born-ml/born@v0.7.14v0.7.13 - ABI Compliance Fixes
Changes
Dependencies Updated
| Package | Old | New |
|---|---|---|
go-webgpu/webgpu |
v0.4.0 | v0.4.1 |
go-webgpu/goffi |
v0.4.0 | v0.4.1 (indirect) |
Upstream Bug Fixes (ABI compliance)
- Float32 encoding: correct XMM bit patterns via
math.Float32bits - AMD64 Unix stack: arguments beyond 6 GP registers properly pushed to stack
- ARM64 Unix stack: arguments beyond 8 GP registers correctly spilled to stack
- AMD64 struct returns (9-16 bytes): RAX+RDX register pair properly assembled
- AMD64 sret pointer: structs > 16 bytes use caller buffer as first argument (RDI)
- ARM64 HFA spilling: Homogeneous Floating-Point Aggregate overflow follows AAPCS64
Upstream Enhancements
runtime.KeepAliveprevents GC of argument pointers during FFI callsErrTooManyArgumentsoverflow detection for calls exceeding 15 arguments
Impact
Critical ABI correctness fixes for multi-platform GPU backend reliability.
Full Changelog: v0.7.12...v0.7.13
Upstream: https://github.com/go-webgpu/webgpu/releases/tag/v0.4.1
v0.7.12 - FFI Hardening & Library Loading
Changes
Dependencies Updated
| Package | Old | New |
|---|---|---|
go-webgpu/webgpu |
v0.3.2 | v0.4.0 |
go-webgpu/goffi |
v0.4.0 | v0.4.0 (unchanged) |
Upstream Improvements
- Null handle guards on 27 public FFI methods — prevents SIGSEGV on nil/released objects
ptrFromUintptrhelper eliminates allgo vetunsafe.Pointer warningsWGPU_NATIVE_PATHenv var for custom wgpu-native library pathloadLibraryreturns(Library, error)with proper error propagation- Windows DLL eager loading — errors surface at init, not at first use
- Enhanced
Init()error messages with library path and remediation suggestions - 85 new null guard test cases upstream
Impact
Significantly improved safety and debuggability of GPU backend initialization.
Full Changelog: v0.7.11...v0.7.12
Upstream: https://github.com/go-webgpu/webgpu/releases/tag/v0.4.0
v0.7.11 - Crosscall2 Callback Integration
Changes
Dependencies Updated
| Package | Old | New |
|---|---|---|
go-webgpu/webgpu |
v0.3.1 | v0.3.2 |
go-webgpu/goffi |
v0.3.9 | v0.4.0 (indirect) |
Upstream Improvements
- crosscall2 integration — callbacks now work from C-library-created threads (Metal, wgpu-native)
- fakecgo trampoline register fixes synced with purego v0.10.0
Bug Fixes
- Xavier init test: added float32 rounding tolerance to prevent false failures
Impact
Improved callback reliability on macOS Metal and native WebGPU implementations.
Full Changelog: v0.7.10...v0.7.11
Upstream: https://github.com/go-webgpu/webgpu/releases/tag/v0.3.2
v0.7.10
🔧 Dependencies Update + Lint Cleanup
Update WebGPU backend to v0.3.1 with critical ARM64 callback fix. Clean up 101 stale //nolint:gosec directives.
Updated Dependencies:
go-webgpu/webgpuv0.3.0 → v0.3.1go-webgpu/goffiv0.3.8 → v0.3.9 (indirect)
Upstream Fixes:
- ARM64 callback trampoline rewrite — fixes LR corruption for callbacks at index > 0
- Symbol rename to prevent linker collision with purego
Code Quality:
- Removed 101 unused
//nolint:gosecdirectives (gosec linter updated, no longer flags these) - Standardized remaining nolint comments to short format
- 0 lint issues across all platforms
Impact: Critical fix for macOS Apple Silicon and Linux ARM64 users.
Links:
- Upstream release: go-webgpu v0.3.1
- PRs: #29, #30
v0.7.9
🔧 Dependencies Update
Update WebGPU backend to v0.3.0 with new capability-querying API and typed errors.
Updated Dependencies:
go-webgpu/webgpuv0.2.1 → v0.3.0
New Upstream Features Available:
Surface.GetCapabilities()— query supported formats, present modes, alpha modesDevice.GetFeatures()/Device.HasFeature()— feature enumerationDevice.GetLimits()— device limits (experimental)- Typed errors with
errors.Is()/errors.As()support (ErrValidation,ErrOutOfMemory,ErrInternal,ErrDeviceLost) - Resource leak detection via
SetDebugMode(true)/ReportLeaks()
Links:
- Upstream release: go-webgpu v0.3.0
- PR: #28
v0.7.8 - GoGPU Ecosystem Integration (Phase 1)
🔧 GoGPU Ecosystem Integration (Phase 1)
Migrate WebGPU backend to unified gputypes for future dual-backend support.
Updated Dependencies
| Package | Old | New |
|---|---|---|
go-webgpu/webgpu |
v0.1.4 | v0.2.1 |
go-webgpu/goffi |
v0.3.7 | v0.3.8 |
gogpu/gputypes |
- | v0.2.0 (new) |
dlclark/regexp2 |
v1.10.0 | v1.11.5 |
google/uuid |
v1.3.0 | v1.6.0 |
Changes
- Migrated all WebGPU types from
wgpu.*togputypes.*:BufferUsage,BufferUsageStorage,BufferUsageCopySrc,BufferUsageCopyDstPowerPreferenceHighPerformance
- Updated 10 files in
internal/backend/webgpu/ - Fixed 3 prealloc warnings in linter
Why This Matters
Prepares codebase for Pure Go WebGPU backend (gogpu/wgpu):
- Unified type system enables future dual-backend architecture
- Build tags will allow:
go build(Rust FFI) vsgo build -tags purego(Pure Go)
Links
- Upstream: go-webgpu v0.2.1
- GoGPU ecosystem: github.com/gogpu
- PR: #27
Installation
```bash
go get github.com/born-ml/born@v0.7.8
```