F4.4: telemetría de faults persistente + safe-mode#82
Merged
Conversation
Libro de cuentas post-mortem que sobrevive a resets: contadores por tarea, total acumulado, conteo de arranques y último FaultReport. - rugus-core: módulo `telemetry` LOG-FREE con `FaultTelemetry` (magic-sealed). `boot()` distingue arranque en frío (RAM con basura, reinicia) de reset en caliente (magic válido, preserva historial e incrementa boot_count). `record()` contabiliza cada fault; `safe_mode()` dispara tras umbral total (16) o por-tarea reincidente (5). 6 tests. - rugus-kernel: static `FAULT_TELEMETRY` en sección `.uninit` (no se pone a cero al arrancar → sobrevive al reset). `telemetry_init()` valida el magic; `fault_hook` registra en la telemetría ANTES de matar la tarea (sobrevive aunque el siguiente paso resetee). API: boot_count/ total_faults/faults_for/safe_mode/last_fault. - ejemplos F407/F769: vuelcan estado de telemetría al arrancar (frío vs caliente + post-mortem del último fault); el supervisor consulta safe_mode() y DEJA de respawnear bad_app para degradarse de forma controlada en lugar de entrar en bucle de crash/respawn. Validado: F407 cold boot → safe-mode tras 5 faults, kernel sigue vivo sin reset. F769 demuestra persistencia: tras reset IWDG el safe-mode SOBREVIVE (warm boot, `.uninit` preservado). LIMITACIÓN: F769 entra en bucle de reset IWDG en safe-mode profundo (deriva de cadencia del IWDG windowed); a diagnosticar con reset-cause en F4.6. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resumen
Libro de cuentas post-mortem que sobrevive a resets: contadores de fault por tarea, total acumulado, conteo de arranques y último
FaultReport. Tras N faults el sistema entra en safe-mode y el supervisor deja de respawnear, degradándose de forma controlada en lugar de entrar en bucle de crash/respawn.telemetryconFaultTelemetrysellado pormagic.boot()distingue arranque en frío (RAM con basura → reinicia) de reset en caliente (magic válido → preserva historial, incrementaboot_count).record()contabiliza faults;safe_mode()dispara por umbral total (16) o por tarea reincidente (5). 6 tests unitarios.FAULT_TELEMETRYen sección.uninit(cortex-m-rt no la pone a cero → sobrevive al reset).telemetry_init()valida el magic;fault_hookregistra ANTES de matar la tarea. API pública:boot_count/total_faults/faults_for/safe_mode/last_fault.safe_mode()y detiene el respawn de bad_app.Validación en placa (RTT)
cold boot, boot_count=1; bad_app faulta, respawn feat(genesis): hito G0 — workspace multi-arch + Cortex-M backend + STM32F7 HAL + blink #1–feat(hal-stm32f7): FMC + SDRAM 16 MB + test walking-ones #4, y tras 5 faults → SAFE-MODE: bad_app NO se respawnea, el kernel sigue vivo sin reset.safe_modeSOBREVIVE (warm boot,.uninitpreservado).Limitación conocida
F769 entra en un bucle de reset IWDG en safe-mode profundo (deriva de cadencia del IWDG windowed de F4.3). A diagnosticar con reset-cause en F4.6.
Test plan
cargo test -p rugus-core(6 telemetry) + host-tests (29)cargo doc --workspace🤖 Generated with Claude Code