Skip to content

Releases: born-ml/born

v0.8.0 — Pure Go WebGPU (gogpu/wgpu)

26 Apr 21:27

Choose a tag to compare

Pure Go WebGPU — No Shared Libraries, No CGO

Born's GPU backend now uses gogpu/wgpu v0.26.8 — a pure Go WebGPU implementation. No more DLL/SO downloads. Just go build.

Highlights

  • True single binary deployment — GPU support built into the executable
  • Vulkan primary backend — stable on Windows, Linux, macOS
  • Zero runtime dependencies — no wgpu-native, no shared libraries
  • Validated — 105 GPU tests pass, real model training (20 epochs, 0 crashes)

What Changed

  • Replaced go-webgpu/webgpu (Rust FFI) with gogpu/wgpu (pure Go)
  • Fixed PipelineLayout lifetime for Vulkan SetBindGroup
  • Fixed lazy ops buffer lifetime with immediate submit
  • Fixed lazy chain data propagation
  • Added runtime.KeepAlive guards for GC safety
  • Added Poll(PollWait) in Release for clean shutdown
  • Sign/Abs operators (#59 by @bennibbelink)

Breaking Changes

  • wgpu_native shared library no longer needed (or used)
  • IsAvailable() now verifies compute shader support (not just adapter presence)

Full changelog

See CHANGELOG.md

v0.7.16

10 Apr 16:02
acd131e

Choose a tag to compare

v0.7.16 — Community Contributions, ONNX 49 Operators, Bugfixes

Third external contributor @gmohmad with 5 PRs! Plus continued work from @bennibbelink.

Added

  • ONNX LayerNormalization operator (#47 by @gmohmad)
  • BatchMatMul NumPy-style broadcasting — supports 2D×3D, singleton batch dims (#49 by @gmohmad)
  • ONNX comparison operators: Greater, GreaterOrEqual, Less, LessOrEqual (#56 by @gmohmad)
  • ONNX logical operators: Not, And, Or, Xor (#56 by @gmohmad)
  • ONNX Erf operator (#56 by @gmohmad)
  • Broadcasting for boolean and comparison ops in CPU backend (#56 by @gmohmad)
  • tensor.BroadcastShapesMatMul public API
  • ONNX AttributeProto tensor attribute parsing (#53 by @gmohmad)

Fixed

  • CPU backend: prevent inplace mutation when operands alias — Mul(x,x) no longer corrupts input (#55, fixes #45)
  • Squeeze scalar handling: returns Shape{} (scalar) instead of Shape{1} (#50 by @gmohmad)
  • ONNX AttributeProto parser: correct protobuf field numbers, non-packed encoding support (#53 by @gmohmad)
  • CI: added test gate job for branch protection (#52)

Refactored

  • ConvDims/PoolDims parameter structs (#46 by @bennibbelink), moved to shared internal/tensor/ package (#48)
  • 14 helper functions extracted from conv2d/maxpool2d inner loops (#17) — compiler-inlined, Conv2D batch path ~28% faster
  • Resolved 86 gosec lint errors after golangci-lint v2.11.4 upgrade (#39)

Stats

Thank you! 🙏

A huge thanks to our contributors:

  • @gmohmad — 5 PRs, 5 issues filed. Found real bugs (inplace aliasing, Squeeze scalar, AttributeProto parsing), added 10 new ONNX operators, and implemented broadcasting. Outstanding work!
  • @bennibbelink — ConvDims/PoolDims refactoring that improved code quality across the conv2d/maxpool2d stack.

Community contributions make Born better. If you'd like to contribute, check our open issues!

Full Changelog: v0.7.15...v0.7.16

v0.7.15 — Erf Operator (Community Contribution)

07 Apr 14:25
745d2af

Choose a tag to compare

Second External Contribution!

Thanks to @bennibbelink for a full vertical slice across the entire stack!

Added:

  • Erf (error function) operator — element-wise error function
  • Backend interface, CPU (math.Erf), WebGPU (Abramowitz & Stegun shader)
  • Autodiff with correct backward pass: 2/√π · exp(-x²)
  • Mock backend, Tensor API (tensor.Erf())
  • Comprehensive tests: forward + backward, float32/float64, edge cases (Inf, NaN)

Links:

go get github.com/born-ml/born@v0.7.15

v0.7.14 — ONNX Equal Operator (Community Contribution)

04 Mar 14:30
2d49a26

Choose a tag to compare

First External Contribution!

Thanks to @jsully1720 for the first community PR!

Added:

  • ONNX Equal operator — binary element-wise comparison returning bool tensor
  • New comparison_ops.go category for ONNX comparison operators
  • registerComparisonOps() wired into operator registry

ONNX operators: 38 → 39

Links:

go get github.com/born-ml/born@v0.7.14

v0.7.13 - ABI Compliance Fixes

02 Mar 06:24
4f7a362

Choose a tag to compare

Changes

Dependencies Updated

Package Old New
go-webgpu/webgpu v0.4.0 v0.4.1
go-webgpu/goffi v0.4.0 v0.4.1 (indirect)

Upstream Bug Fixes (ABI compliance)

  • Float32 encoding: correct XMM bit patterns via math.Float32bits
  • AMD64 Unix stack: arguments beyond 6 GP registers properly pushed to stack
  • ARM64 Unix stack: arguments beyond 8 GP registers correctly spilled to stack
  • AMD64 struct returns (9-16 bytes): RAX+RDX register pair properly assembled
  • AMD64 sret pointer: structs > 16 bytes use caller buffer as first argument (RDI)
  • ARM64 HFA spilling: Homogeneous Floating-Point Aggregate overflow follows AAPCS64

Upstream Enhancements

  • runtime.KeepAlive prevents GC of argument pointers during FFI calls
  • ErrTooManyArguments overflow detection for calls exceeding 15 arguments

Impact

Critical ABI correctness fixes for multi-platform GPU backend reliability.

Full Changelog: v0.7.12...v0.7.13
Upstream: https://github.com/go-webgpu/webgpu/releases/tag/v0.4.1

v0.7.12 - FFI Hardening & Library Loading

27 Feb 14:06
24e87cf

Choose a tag to compare

Changes

Dependencies Updated

Package Old New
go-webgpu/webgpu v0.3.2 v0.4.0
go-webgpu/goffi v0.4.0 v0.4.0 (unchanged)

Upstream Improvements

  • Null handle guards on 27 public FFI methods — prevents SIGSEGV on nil/released objects
  • ptrFromUintptr helper eliminates all go vet unsafe.Pointer warnings
  • WGPU_NATIVE_PATH env var for custom wgpu-native library path
  • loadLibrary returns (Library, error) with proper error propagation
  • Windows DLL eager loading — errors surface at init, not at first use
  • Enhanced Init() error messages with library path and remediation suggestions
  • 85 new null guard test cases upstream

Impact

Significantly improved safety and debuggability of GPU backend initialization.

Full Changelog: v0.7.11...v0.7.12
Upstream: https://github.com/go-webgpu/webgpu/releases/tag/v0.4.0

v0.7.11 - Crosscall2 Callback Integration

27 Feb 09:58
cdb8a69

Choose a tag to compare

Changes

Dependencies Updated

Package Old New
go-webgpu/webgpu v0.3.1 v0.3.2
go-webgpu/goffi v0.3.9 v0.4.0 (indirect)

Upstream Improvements

  • crosscall2 integration — callbacks now work from C-library-created threads (Metal, wgpu-native)
  • fakecgo trampoline register fixes synced with purego v0.10.0

Bug Fixes

  • Xavier init test: added float32 rounding tolerance to prevent false failures

Impact

Improved callback reliability on macOS Metal and native WebGPU implementations.

Full Changelog: v0.7.10...v0.7.11
Upstream: https://github.com/go-webgpu/webgpu/releases/tag/v0.3.2

v0.7.10

18 Feb 12:30
f172d6c

Choose a tag to compare

🔧 Dependencies Update + Lint Cleanup

Update WebGPU backend to v0.3.1 with critical ARM64 callback fix. Clean up 101 stale //nolint:gosec directives.

Updated Dependencies:

  • go-webgpu/webgpu v0.3.0 → v0.3.1
  • go-webgpu/goffi v0.3.8 → v0.3.9 (indirect)

Upstream Fixes:

  • ARM64 callback trampoline rewrite — fixes LR corruption for callbacks at index > 0
  • Symbol rename to prevent linker collision with purego

Code Quality:

  • Removed 101 unused //nolint:gosec directives (gosec linter updated, no longer flags these)
  • Standardized remaining nolint comments to short format
  • 0 lint issues across all platforms

Impact: Critical fix for macOS Apple Silicon and Linux ARM64 users.

Links:

v0.7.9

09 Feb 11:10
02026b5

Choose a tag to compare

🔧 Dependencies Update

Update WebGPU backend to v0.3.0 with new capability-querying API and typed errors.

Updated Dependencies:

  • go-webgpu/webgpu v0.2.1 → v0.3.0

New Upstream Features Available:

  • Surface.GetCapabilities() — query supported formats, present modes, alpha modes
  • Device.GetFeatures() / Device.HasFeature() — feature enumeration
  • Device.GetLimits() — device limits (experimental)
  • Typed errors with errors.Is() / errors.As() support (ErrValidation, ErrOutOfMemory, ErrInternal, ErrDeviceLost)
  • Resource leak detection via SetDebugMode(true) / ReportLeaks()

Links:

v0.7.8 - GoGPU Ecosystem Integration (Phase 1)

29 Jan 15:31
f77fdf2

Choose a tag to compare

🔧 GoGPU Ecosystem Integration (Phase 1)

Migrate WebGPU backend to unified gputypes for future dual-backend support.

Updated Dependencies

Package Old New
go-webgpu/webgpu v0.1.4 v0.2.1
go-webgpu/goffi v0.3.7 v0.3.8
gogpu/gputypes - v0.2.0 (new)
dlclark/regexp2 v1.10.0 v1.11.5
google/uuid v1.3.0 v1.6.0

Changes

  • Migrated all WebGPU types from wgpu.* to gputypes.*:
    • BufferUsage, BufferUsageStorage, BufferUsageCopySrc, BufferUsageCopyDst
    • PowerPreferenceHighPerformance
  • Updated 10 files in internal/backend/webgpu/
  • Fixed 3 prealloc warnings in linter

Why This Matters

Prepares codebase for Pure Go WebGPU backend (gogpu/wgpu):

  • Unified type system enables future dual-backend architecture
  • Build tags will allow: go build (Rust FFI) vs go build -tags purego (Pure Go)

Links

Installation

```bash
go get github.com/born-ml/born@v0.7.8
```