Use separate dispatchers for MemoryQueue, QueueManager, and Akka heartbeat by style95 · Pull Request #5549 · apache/openwhisk

style95 · 2025-08-29T23:15:08Z

This is to make the system more stable. There are generally many numbers of queues running in a scheduler.
One queue will spawn multiple actors, and there could be a huge number of actors.
There are some critical actors to maintain the sanity of the system like akka-heartbeat.
This change will ensure isolating the performance impact of memory queues and guaranteeing the akka-heartbeat is not being starved.

Description

Related issue and scope

I opened an issue to propose and discuss this change (#????)

My changes affect the following components

Types of changes

Bug fix (generally a non-breaking change which closes an issue).
Enhancement or new feature (adds new functionality).
Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

I signed an Apache CLA.
I reviewed the style guides and followed the recommendations (Travis CI will check :).
I added tests to cover my changes.
My changes require further changes to the documentation.
I updated the documentation where necessary.

…tbeat

style95 · 2025-08-29T23:15:32Z

common/scala/src/main/resources/application.conf

    }
+
+    cluster {
+        use-dispatcher = "dispatchers.heartbeat-dispatcher"


I assigned a separate dispatcher for the akka-cluster heartbeat.

style95 · 2025-08-29T23:24:34Z

common/scala/src/main/resources/application.conf

    }
    metric {
-        tick-interval = 1 second
+        tick-interval = 10 second


This is one of the main changes. According to my analysis, it seems there is a leak in the Kamon metric.
As a scheduler is running longer, there are a huge number of MetricSnapshot instances created.

Below is the heap dump for a scheduler when it faced a thread starvation.

There are 97M numbers of scala.collection.immutable.$colon$colon and 96M numbers of kamon.metric.Instrument$Snapshot.

The reference dominator of scala.collection.immutable.$colon$colon is mostly MetricSnapshot.

Also, kamon.metric.Instrument$Snapshot is mostly referenced by scala.collection.immutable.$colon$colon, in turn, it results in MetricSnapshot.

All components other than MemoryQueue are emitting metrics with 10s intervals. So I updated the metric emission interval of MemoryQueue to 10s as well.
Since now we emit all metrics every 10s, we don't need to use a smaller tick interval like 1s because it will try to create a snapshot every 1 second, but the metric itself remains unchanged for 10s because we don't emit them in the middle of the interval(10s).

style95 · 2025-08-29T23:27:56Z

core/scheduler/src/main/scala/org/apache/openwhisk/core/scheduler/queue/MemoryQueue.scala

  private[queue] var initialized = false

-  private val logScheduler: Cancellable = context.system.scheduler.scheduleWithFixedDelay(0.seconds, 1.seconds) { () =>
+  private val logScheduler: Cancellable = context.system.scheduler.scheduleWithFixedDelay(0.seconds, 10.seconds) { () =>


This was emitting 5 metrics every 1 second. If there are 400 queues running, they will emit around 2000 metrics per second. Considering the fact that one memory queue will spawn multiple sub-actors, and the combination with the use of CachedThreadPool, which spawns an unlimited number of actors on demand, it caused thread starvation.

style95 · 2025-08-29T23:28:21Z

core/scheduler/src/main/resources/reference.conf

@@ -0,0 +1,37 @@
+dispatchers {


I introduced separate dispatchers to guarantee performance and minimize performance impact.
I used the fork-join-executor as their jobs are mostly CPU-bound work.

dgrove-oss

LGTM. Thanks for the detailed comments explaining the PR.

bdoyle0182 · 2025-10-09T04:16:18Z

LGTM, do we still want to merge this?

Use separate dispatchers for MemoryQueue, QueueManager, and Akka hear…

a2c0254

…tbeat

style95 commented Aug 29, 2025

View reviewed changes

Add apache license header

d067e58

style95 requested review from bdoyle0182, dgrove-oss and rabbah August 29, 2025 23:33

dgrove-oss approved these changes Aug 30, 2025

View reviewed changes

bdoyle0182 approved these changes Oct 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use separate dispatchers for MemoryQueue, QueueManager, and Akka heartbeat#5549

Use separate dispatchers for MemoryQueue, QueueManager, and Akka heartbeat#5549
style95 wants to merge 2 commits intoapache:masterfrom
style95:increase-metric-tick

style95 commented Aug 29, 2025 •

edited

Loading

Uh oh!

style95 Aug 29, 2025

Uh oh!

style95 Aug 29, 2025

Uh oh!

style95 Aug 29, 2025

Uh oh!

style95 Aug 29, 2025

Uh oh!

dgrove-oss left a comment

Uh oh!

bdoyle0182 commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

style95 commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issue and scope

My changes affect the following components

Types of changes

Checklist:

Uh oh!

style95 Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

style95 Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

style95 Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

style95 Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

dgrove-oss left a comment

Choose a reason for hiding this comment

Uh oh!

bdoyle0182 commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

style95 commented Aug 29, 2025 •

edited

Loading