mitigate child processes with 100% CPU and/or zombies#76
Open
mmitch wants to merge 2 commits intozigdon:masterfrom
Open
mitigate child processes with 100% CPU and/or zombies#76mmitch wants to merge 2 commits intozigdon:masterfrom
mmitch wants to merge 2 commits intozigdon:masterfrom
Conversation
While this does not fix the cause for child processes looping endlessly in read()/EGAIN, it will detect these loops and kill the runaway processes automatically. It also prevents zombies after killing these child processes (which happened before when you killed them manually). - Fix the assumption that an unresponsive child process has died. Until now these processes were just 'forgotten', but now it is checked whether the process is unresponsive but still alive (e.g. in an endless loop) and if alive, the process is killed before it is 'forgotten'. - Fix the reaping of dead child processes. Until now, only a single waitpid() call was issued, which only repead one process, even if there were multiple processes waiting to be reaped. Now a loop is used and all potential zombies should be reaped properly.
Switch to proposed 'variant 3': Irssi::pidwait_add() already does a waitpid() for any child. Manual calls to waitpid() and Irssi::pidwait_remove() should not be necessary, so remove them altogether. (I've grep(1)ed the irssi scripts directory on my Debian stable and none of the scripts (excepit twirssi :-) that calls pidwait_add() calls either pidwait_remove() or waitpid(), so this should work.)
|
FWIW -- very looking forward to see the fix. @mmitch if you could share variant 3 patch I would be glad to try it out as well |
Contributor
Author
|
@yarikoptic with commit mmitch@7e976ff variant 3 is active |
|
I'm also affected by this bug. I'm trying the variant 3 by @mmitch now to see it that solves it for me. |
|
Reporting back: 16 runaway processes killed in the last 24 hours. |
|
Is twirssi now stable without this PR or is there something wrong with this? I have been running mmitch/twirssi@7e976ff since may 2015 without problems. Now to get longer tweets I should update script but is it still creating runaway processes? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This should work around the 100% CPU and/or zombie issues (for details see commit log).
It does not fix the root cause of the endless loop in the child processes, but it kill's them, so they don't live very long.
This is by no means good code, but a works-for-me with some ugly hacks.
Should probably be refined or tested by more people than just me before a merge :-)
(is there an experimental or -dev branch?)