Use separate dispatchers for MemoryQueue, QueueManager, and Akka heartbeat#5549
Use separate dispatchers for MemoryQueue, QueueManager, and Akka heartbeat#5549style95 wants to merge 2 commits intoapache:masterfrom
Conversation
| } | ||
|
|
||
| cluster { | ||
| use-dispatcher = "dispatchers.heartbeat-dispatcher" |
There was a problem hiding this comment.
I assigned a separate dispatcher for the akka-cluster heartbeat.
| } | ||
| metric { | ||
| tick-interval = 1 second | ||
| tick-interval = 10 second |
There was a problem hiding this comment.
This is one of the main changes. According to my analysis, it seems there is a leak in the Kamon metric.
As a scheduler is running longer, there are a huge number of MetricSnapshot instances created.
Below is the heap dump for a scheduler when it faced a thread starvation.
There are 97M numbers of scala.collection.immutable.$colon$colon and 96M numbers of kamon.metric.Instrument$Snapshot.
The reference dominator of scala.collection.immutable.$colon$colon is mostly MetricSnapshot.
Also, kamon.metric.Instrument$Snapshot is mostly referenced by scala.collection.immutable.$colon$colon, in turn, it results in MetricSnapshot.
All components other than MemoryQueue are emitting metrics with 10s intervals. So I updated the metric emission interval of MemoryQueue to 10s as well.
Since now we emit all metrics every 10s, we don't need to use a smaller tick interval like 1s because it will try to create a snapshot every 1 second, but the metric itself remains unchanged for 10s because we don't emit them in the middle of the interval(10s).
| private[queue] var initialized = false | ||
|
|
||
| private val logScheduler: Cancellable = context.system.scheduler.scheduleWithFixedDelay(0.seconds, 1.seconds) { () => | ||
| private val logScheduler: Cancellable = context.system.scheduler.scheduleWithFixedDelay(0.seconds, 10.seconds) { () => |
There was a problem hiding this comment.
This was emitting 5 metrics every 1 second. If there are 400 queues running, they will emit around 2000 metrics per second. Considering the fact that one memory queue will spawn multiple sub-actors, and the combination with the use of CachedThreadPool, which spawns an unlimited number of actors on demand, it caused thread starvation.
| @@ -0,0 +1,37 @@ | |||
| dispatchers { | |||
There was a problem hiding this comment.
I introduced separate dispatchers to guarantee performance and minimize performance impact.
I used the fork-join-executor as their jobs are mostly CPU-bound work.
dgrove-oss
left a comment
There was a problem hiding this comment.
LGTM. Thanks for the detailed comments explaining the PR.
|
LGTM, do we still want to merge this? |
This is to make the system more stable. There are generally many numbers of queues running in a scheduler.
One queue will spawn multiple actors, and there could be a huge number of actors.
There are some critical actors to maintain the sanity of the system like akka-heartbeat.
This change will ensure isolating the performance impact of memory queues and guaranteeing the akka-heartbeat is not being starved.
Description
Related issue and scope
My changes affect the following components
Types of changes
Checklist: