add defensive code for dealing with loadbalancer concurrency tracking#4951
add defensive code for dealing with loadbalancer concurrency tracking#4951tysonnorris wants to merge 4 commits intoapache:masterfrom
Conversation
| { | ||
| case Failure(e) => | ||
| logging.error(this, s"Failed to process message for topic $topic : $e (stack trace included)") | ||
| e.printStackTrace() |
There was a problem hiding this comment.
printing the stack this way excludes it from the logger formatting - is that intended?
There was a problem hiding this comment.
not intentional - can you refer me to an example of better formatting for stack trace?
Thanks!
There was a problem hiding this comment.
Isn't it already part of the log line above anyway? I guess the broader question is: Why is the stack trace even needed here?
There was a problem hiding this comment.
Now it is, and I forgot to remove the e.printStackTrace() when I changed the log line to include it... 😔 fixing that...
markusthoemmes
left a comment
There was a problem hiding this comment.
Good one, direct map access is probably always bad 😅
| { | ||
| case Failure(e) => | ||
| logging.error(this, s"Failed to process message for topic $topic : $e (stack trace included)") | ||
| e.printStackTrace() |
There was a problem hiding this comment.
Isn't it already part of the log line above anyway? I guess the broader question is: Why is the stack trace even needed here?
common/scala/src/main/scala/org/apache/openwhisk/common/NestedSemaphore.scala
Outdated
Show resolved
Hide resolved
… events cause maps to reset
…Semaphore.scala Co-authored-by: Markus Thömmes <markusthoemmes@me.com>
06a4c6f to
61ac430
Compare
| if (logHandoff) logging.debug(this, s"processing $topic[$partition][$offset] ($occupancy/$handlerCapacity)") | ||
| handler(bytes) | ||
| handler(bytes).andThen { | ||
| { |
There was a problem hiding this comment.
Are the inner braces needed?
Defensive code for concurrency tracking in sharding loadbalancer.
Description
We saw some cases where controller ack processing was incomplete for some activations, which turned out to be:
This PR does 2 things:
This was not caught in tests due to the multi-controller cluster state changes, plus concurrency, required to reproduce the symptom.
Related issue and scope
My changes affect the following components
Types of changes
Checklist: