Skip to content

feat(ntx-builder): deactivate accounts which crash repeatedly#1712

Open
SantiagoPittella wants to merge 2 commits intosantiagopittella-ntx-builder-actor-deactivationfrom
santiagopittella-ntx-builder-account-blacklisting
Open

feat(ntx-builder): deactivate accounts which crash repeatedly#1712
SantiagoPittella wants to merge 2 commits intosantiagopittella-ntx-builder-actor-deactivationfrom
santiagopittella-ntx-builder-account-blacklisting

Conversation

@SantiagoPittella
Copy link
Collaborator

Implements account blacklisting for the NTX Builder (4th task from #1694).

Track crash counts per account in the coordinator. When an actor shuts down due to a DbError, its crash count is incremented. Once it reaches a configurable threshold, the account is blacklisted and spawn_actor skips it.

Only DbError shutdowns count as crashes because other shutdown reasons (Cancelled, IdleTimeout, SemaphoreFailed) are either intentional or system-wide and not indicative of a per-account bug.

tracing::warn!(
%account_id,
crash_count = count,
"Account blacklisted due to repeated crashes, skipping actor spawn"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: should we make this more explicit? tracing::error!, since it signifies a bug in our impl or add "BUG" deliberately to the message?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should already be logging the crash itself, so this is probably okay at warn since it will repeat often for each account.

return;
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: should we first check for running actors and terminate them?

assert_eq!(inactive_targets[0], inactive_id);
}

// BLACKLIST TESTS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we might want to find a different term to "blacklist", i.e. repeated_failure_block_list to avoid any confusion / fud

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer something much shorter -- we're already overly verbose and stuttery throughout the codebase imo. Maybe deactivated_accounts

Copy link
Collaborator

@Mirko-von-Leipzig Mirko-von-Leipzig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just naming nits :) I guess blacklist was the wrong term to use -- let's try deactivated instead 🙃

Comment on lines +141 to +144
/// Maximum number of crashes before an account actor is blacklisted.
///
/// Once an actor for a given account exceeds this crash count, no new actor will be
/// spawned for that account.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Maximum number of crashes before an account actor is blacklisted.
///
/// Once an actor for a given account exceeds this crash count, no new actor will be
/// spawned for that account.
/// Maximum number of crashes before an account deactivated.
///
/// Once this limit is reached, no new transactions will be created for this account.

Comment on lines +145 to +146
#[arg(long = "ntx-builder.max-actor-crashes", default_value_t = 10, value_name = "NUM")]
pub max_actor_crashes: usize,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid mentioning actor in the public docs -- its more an implementation detail.

Suggested change
#[arg(long = "ntx-builder.max-actor-crashes", default_value_t = 10, value_name = "NUM")]
pub max_actor_crashes: usize,
#[arg(long = "ntx-builder.max-account-crashes", default_value_t = 10, value_name = "NUM")]
pub max_account_crashes: usize,

tracing::warn!(
%account_id,
crash_count = count,
"Account blacklisted due to repeated crashes, skipping actor spawn"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should already be logging the crash itself, so this is probably okay at warn since it will repeat often for each account.

if let Some(&count) = self.crash_counts.get(&account_id) {
if count >= self.max_actor_crashes {
tracing::warn!(
%account_id,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
%account_id,
account.id = %account_id,

- Replaced NTX Builder's in-memory state management with SQLite-backed persistence; account states, notes, and transaction effects are now stored in the database and inflight state is purged on startup ([#1662](https://github.com/0xMiden/miden-node/pull/1662)).
- [BREAKING] Reworked `miden-remote-prover`, removing the `worker`/`proxy` distinction and simplifying to a `worker` with a request queue ([#1688](https://github.com/0xMiden/miden-node/pull/1688)).
- NTX Builder actors now deactivate after being idle for a configurable idle timeout (`--ntx-builder.idle-timeout`, default 5 min) and are re-activated when new notes target their account ([#1705](https://github.com/0xMiden/miden-node/pull/1705)).
- NTX Builder now blacklists network accounts whose actors crash repeatedly (configurable via `--ntx-builder.max-actor-crashes`, default 10) ([#1712](https://github.com/0xMiden/miden-node/pull/1712)).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- NTX Builder now blacklists network accounts whose actors crash repeatedly (configurable via `--ntx-builder.max-actor-crashes`, default 10) ([#1712](https://github.com/0xMiden/miden-node/pull/1712)).
- NTX Builder now deactivates network accounts which crash repeatedly (configurable via `--ntx-builder.max-actor-crashes`, default 10) ([#1712](https://github.com/0xMiden/miden-node/pull/1712)).

assert_eq!(inactive_targets[0], inactive_id);
}

// BLACKLIST TESTS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer something much shorter -- we're already overly verbose and stuttery throughout the codebase imo. Maybe deactivated_accounts

///
/// The URLs are fake and actors spawned with this context will fail on their first gRPC call,
/// but this is sufficient for testing coordinator logic (registry, blacklisting, etc.).
fn test_actor_context(db: &Db) -> AccountActorContext {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe have this be a method on AccountActorContext::test?

async fn test_coordinator() -> (Coordinator, tempfile::TempDir) {
let (db, dir) = Db::test_setup().await;
(Coordinator::new(4, db), dir)
(Coordinator::new(4, 10, db), dir)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be a method on Coordinator

let count = self.crash_counts.entry(account_id).or_insert(0);
*count += 1;
tracing::error!(
%account_id,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
%account_id,
account.id = %account_id,

@Mirko-von-Leipzig Mirko-von-Leipzig changed the title feat(ntx-builder): blacklist accounts whose actors crash repeatedly feat(ntx-builder): deactivate accounts which crash repeatedly Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants