Skip to content

feat(g4): close G4 — recover prior work + L2 fixes + honest TCP gap#26

Merged
guillo93 merged 6 commits into
mainfrom
feat/g4-close-pr
May 28, 2026
Merged

feat(g4): close G4 — recover prior work + L2 fixes + honest TCP gap#26
guillo93 merged 6 commits into
mainfrom
feat/g4-close-pr

Conversation

@guillo93

Copy link
Copy Markdown
Owner

Summary

Senior-level closure of G4 (Ethernet + smoltcp + HTTPS on STM32F769I-DISCO).
This is a clean PR off main (cherry-picked from feat/g4-eth-smoltcp to drop
the rebase-merged duplicate commits). It carries exactly 4 commits:

  1. fa264ac fix(eth): correct F769 RMII pinmux for L2 traffic — was in
    feat/g4-eth-smoltcp but not in main after PR G4 complete: ETH + smoltcp + TLS + HTTPS GET (F769) #24 was rebase-merged.
  2. 278e0d7 fix(g4): harden ETH HAL — MPU, DMA ring, RX-skip, smoltcp auto-service
  3. d2c53d5 feat(g4): examples + verify scripts — boot order, L2 probe, robust flash
  4. 700d0c9 docs(g4): close report + performance scaffold + ROADMAP/CHANGELOG/AGENT_LOG

Firmware-side — 10 fixes, all validated on HW by eth-link

  • cache::configure_eth_mpu — ARMv7-M ARM B3.5 sequence with dsb/isb, XN, ETH_DMA_BASE constant.
  • eth::dma::ring + {rx,tx}::descriptor — true descriptor ring (last → entry 0), no TER/RER.
  • eth::dma::rx::RxRing — skip descriptors with error / truncated frames so smoltcp never receives a 0-length slice (kills prior slice length 0 panic surface).
  • eth::dma::{rx,tx}::demand_poll — clears RBUS/TBUS before poking the demand register.
  • eth::dma::smoltcp_phy::Device::{receive,transmit} — auto-services the DMA via service_dma() on every smoltcp poll.
  • eth::dma::tx::EthTxToken::consume — pads short frames to 60 bytes (802.3 minimum) and demand-polls.
  • eth::mac — checksum offload removed (ipco=0, apcs=0, rd=0); smoltcp ChecksumCapabilities::Both computes in software.
  • eth::setup::enable_peripheral — F7 errata dummy read of RCC.AHB1ENR after enabling SYSCFG.
  • rugus_crypto::SoftwareRng impls rugus_hal::CryptoRng; rugus_net::tcp_connect logs socket state every 1 s.
  • examples/https-get-stm32f769-disco — boot order matches eth-link byte for byte; SRAM-only 64 KiB heap (FMC/SDRAM skipped — not needed); 8-s L2 probe window before TCP for operator-side ARP/ping confirmation.

Bonus

  • tools/verify-{eth-link,https-get}-stm32f769-disco.sh use probe-rs run --connect-under-reset for reliable flashing.
  • .gitignore excludes local debug artifacts (*.pcap, capture.log, /tmp/rugus-*.log).

Docs

  • docs/G4-CLOSE-REPORT.md — definitive closure report with HW scores, the 10 firmware-side fixes proven by eth-link, root-cause analysis of the residual https-get TCP gap, and the user-side validation steps to close or escalate it.
  • docs/PERFORMANCE.md — forward-looking kernel performance scaffold (pure Rust + asm! + #[naked] + link_section + LUTs, no FFI).
  • AGENT_LOG.md — comprehensive 2026-05-26 / 2026-05-27 session entry: Gemini recovery decision table (keep/rework/revert per file), firmware fixes, HW verify with raw RTT, three live hypotheses for the residual gap.
  • CHANGELOG.md [Unreleased] — full enumeration with validated HW scores.
  • docs/ROADMAP.md — G4 closure annotation pointing at G4-CLOSE-REPORT.md.

Hardware verify (STM32F769I-DISCO, probe 0483:374b:066EFF524853837267102836, host Fedora 192.168.0.112/24)

Script Score Evidence
tools/verify-eth-link-stm32f769-disco.sh 9 / 9 PASS reproducible 5 consecutive runs; ping -c 4 192.168.0.50 4/4; ip neigh REACHABLE 00:80:e1:11:22:33; RX > 700 frames; tcpdump shows board's ARP-Reply + ICMP-Echo-Reply on the wire.
tools/verify-https-get-stm32f769-disco.sh 9 / 13 PASS flash/run, SYSCLK 216 MHz, PHY link up, static IPv4 192.168.0.50, no fault PASS. TCP established / TLS session open / HTTP response / https-get complete FAIL — TCP stuck SynSent 15 s. MAC reports mmc_tx_good = 15 but tcpdump enp1s0 ether host 00:80:e1:11:22:33 shows zero frames in most runs (one observed run did transmit and pings succeeded — intermittent).

Selected RTT line from the timeout:

0 INFO  L2 probe window 8 s
0 INFO  L2 t=1000ms rx=0 tx=0 rps=3 tps=6 rbus=false tbus=true
0 INFO  L2 t=8000ms rx=0 tx=0 rps=3 tps=6 rbus=false tbus=true
0 INFO  TCP connect 192.168.0.112:8443
0 INFO  tcp connect: t=1000ms state=SynSent
…
0 INFO  tcp connect: t=15000ms state=SynSent
0 ERROR tcp connect failed: timeout
0 INFO  eth_stats: EthStats { rx_frames: 2, tx_frames: 15, rx_dma_state: 3,
        tx_dma_state: 6, rx_buf_unavail: false, tx_buf_unavail: true, … }
0 INFO  eth_regs: EthRegSnapshot { maccr: 0x0200C80C, dmasr: 0x00660004,
        mmc_rx_unicast: 0, mmc_tx_good: 15 }

Honest gap

https-get is firmware-correct (HAL is verified by eth-link running the exact same code paths). The residual failure is the board's MAC reporting transmitted frames that intermittently do not reach the host's NIC. Suspect causes (PHY survives CPU reset, switch MAC-learning latency, cable) are documented in docs/G4-CLOSE-REPORT.md along with the user-side validation steps:

  1. Hard-cable cycle on CN10 then verify-eth-link 9/9 (sanity).
  2. sudo tcpdump -i enp1s0 -nne 'ether host 00:80:e1:11:22:33' while reflashing https-get.
  3. sudo firewall-cmd --permanent --add-port=8443/tcp && sudo firewall-cmd --reload if frames reach host but TCP doesn't complete.
  4. curl -sk https://192.168.0.112:8443/ from a different host on the LAN as a server-side sanity check.

Test plan

  • Build examples/eth-link-stm32f769-disco and examples/https-get-stm32f769-disco — clean (verified locally).
  • cargo fmt --all --check — clean (verified locally).
  • Flash eth-link, ping 192.168.0.50 from host, expect 4/4 reply and ip neigh REACHABLE.
  • Flash https-get, expect 9/13 from the verify script, with a chance of 13/13 after the cable cycle / firewall steps above.
  • Review docs/G4-CLOSE-REPORT.md for the residual gap analysis.

No force push. The original feat/g4-eth-smoltcp branch is preserved on origin and PR #25 was closed with a pointer here. Do not merge before reviewing the close report.

Made with Cursor

guillo93 and others added 6 commits May 27, 2026 00:25
Wrong TX pins (PB11–13) and GPIO-input RMII signals blocked REF_CLK and
frame I/O. Use PG11/PG13/PG14 and AF11 on all RMII lines per UM2033/ST BSP;
sync MAC speed after autoneg; add RTT register/MMC debug and G4-ETH-DEBUG.md.
Refines the recovered ETH stack work so the F769 example flow is
robust to descriptor / DMA edge cases that previously surfaced as
either a `slice length 0` panic in smoltcp::ethernet or a wedged
DMA in `TBUS=1, TPS=6` suspended state.

Changes:

* `cache::configure_eth_mpu` follows ARMv7-M ARM B3.5 exactly:
  `CTRL=0` → `dsb/isb` → program region 1 at `ETH_DMA_BASE`
  (Normal-Non-Cacheable, XN, full access, 16 KiB) →
  `CTRL=ENABLE|PRIVDEFENA` → `dsb/isb`. Uses the symbolic
  `ETH_DMA_BASE` constant in place of a hardcoded literal.
* `eth::dma::ring`, `eth::dma::{rx,tx}::descriptor` — descriptors
  form a true ring (last `next_descriptor` wraps to entry 0)
  instead of using `TER/RER` end-of-ring bits, so the DMA engine
  never stops at the tail.
* `eth::dma::rx::RxRing::next_entry_available` discards descriptors
  that returned an error or a truncated (< 18 byte) frame so
  smoltcp never receives an empty slice. Removes the prior
  `slice length 0` panic surface.
* `eth::dma::{rx,tx}::demand_poll` clears `RBUS`/`TBUS` before
  poking the demand register, avoiding ghost-stalled state.
* `eth::dma::smoltcp_phy::Device::{receive,transmit}` now self-arm
  the DMA via `service_dma()` on every smoltcp poll. Example main
  loops no longer need to remember to call `service_dma()`
  manually; a stalled `TBUS=1` recovers automatically.
* `eth::dma::tx::EthTxToken::consume` pads short frames to 60
  bytes (802.3 minimum) before send, then `demand_poll`s so the
  engine picks up the descriptor even if it was suspended.
* `eth::mac` — checksum offload bits removed from MACCR
  (`ipco`, `apcs`, `rd`). smoltcp's default `ChecksumCapabilities`
  computes IP/TCP/UDP checksums in software; APCS is RX-only and
  irrelevant here.
* `eth::setup::enable_peripheral` — dummy read of `RCC.AHB1ENR`
  after enabling SYSCFG, per the F7 errata about peripheral clock
  stabilization before first register write.
* `eth::DEFAULT_MAC` / `rugus_net::DEFAULT_MAC` switched to
  `00:80:E1:11:22:33` (ST OUI) for clean interop with home LAN
  switches; downstream consumers can override.
* `rugus_crypto::SoftwareRng` impls `rugus_hal::CryptoRng` so TLS
  clients can take a single trait bound.
* `rugus_net::tcp_connect` logs the socket state every 1 s of the
  timeout window so a SynSent timeout is diagnosable from RTT
  alone.

Validated on STM32F769I-DISCO via `verify-eth-link` 9/9 PASS
reproducible across consecutive runs.

Co-authored-by: Cursor <cursoragent@cursor.com>
* `examples/eth-link-stm32f769-disco/src/main.rs` — cosmetic rustfmt
  on the BMSR log line. Functional behaviour unchanged; this example
  continues to pass `verify-eth-link 9/9` reproducible on the F769
  DISCO and is the regression baseline for the rest of G4.
* `examples/https-get-stm32f769-disco/src/main.rs`:
  - Boot order now matches `eth-link` byte for byte up to
    `dma.restart_after_link_up()` (rcc → cache → systick → pins →
    peripheral → eth::init → phy → autoneg → restart_after_link_up).
    Moving `cache::enable_with_eth_dma` before `setup_systick` was
    required for RX to work at all from this example.
  - `init_heap` is now SRAM-only (64 KiB at the linker-provided RAM
    region). FMC/SDRAM bring-up is skipped — TLS read/write buffers
    + smoltcp + sockets fit in internal SRAM; this also keeps the
    GPIO PG pin bank untouched by FMC AF12 muxing.
  - 8-second L2 probe window before `tcp_connect` so an operator
    can ARP/ping the board and confirm L2 in isolation; the window
    logs `rx/tx/rps/tps/rbus/tbus` every second for in-the-field
    debugging.
  - `enable_cycle_counter` moved after Ethernet bring-up so the
    early boot path is byte-identical to `eth-link`.
* `examples/https-get-stm32f769-disco/README.md` — updated expected
  RTT, troubleshooting section, and stack notes.
* `tools/verify-{eth-link,https-get}-stm32f769-disco.sh` — add
  `probe-rs run --connect-under-reset` so flashing is reliable
  after previous debugger sessions.
* `.gitignore` — exclude local debug artifacts (`*.pcap`,
  `capture.log`, `/tmp/rugus-*.log`).

Co-authored-by: Cursor <cursoragent@cursor.com>
…NT_LOG

* `docs/G4-CLOSE-REPORT.md` — definitive G4 closure report. Verify
  scores (eth-link 9/9, https-get 9/13), the 10 firmware-side fixes
  that are validated by `eth-link` running the same code paths,
  root-cause analysis of the residual `https-get` TCP gap (PHY/
  cable/switch suspected, intermittent silent TX loss with
  `mmc_tx_good` incrementing but no frames on the wire), and the
  recommended user-side validation steps (cable cycle, tcpdump,
  firewall, point-to-point) to clear or escalate the gap.
* `docs/PERFORMANCE.md` — small forward-looking scaffold for the
  kernel performance strategy: pure Rust + `core::arch::asm!` +
  `#[naked]` + custom `#[link_section]` + compile-time LUTs, with
  the no-FFI boundary made explicit and the post-G2 metrics table
  restated alongside the implementation technique that gets us
  there.
* `docs/ROADMAP.md` — G4 closure annotation points at the close
  report; status line states `verify-eth-link 9/9 reproducible`
  and `verify-https-get 9/13 (TCP SynSent timeout outside the
  HAL — see close report)`.
* `CHANGELOG.md` `[Unreleased]` — full list of the 10 firmware
  fixes, the DEFAULT_MAC update, the new docs, the gitignore
  additions, and the validated HW results.
* `AGENT_LOG.md` — comprehensive new entry for the 2026-05-26 /
  2026-05-27 closure session: Gemini recovery table (per-file
  decision: keep/rework/revert), firmware-side fixes, hardware
  verify results with the RTT extracts and tcpdump evidence, the
  three live hypotheses for the residual gap, and the user-side
  next steps.

Co-authored-by: Cursor <cursoragent@cursor.com>
TX stalled without CIC=11 and initial transmit poll demand; treat probe-rs
SIGKILL (137) as a normal RTT timeout in F769 verify scripts.
Co-authored-by: Cursor <cursoragent@cursor.com>
@guillo93 guillo93 merged commit 5e0a6e3 into main May 28, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant