Skip to content

Comments

mitigate child processes with 100% CPU and/or zombies#76

Open
mmitch wants to merge 2 commits intozigdon:masterfrom
mmitch:fix-runaway-childs
Open

mitigate child processes with 100% CPU and/or zombies#76
mmitch wants to merge 2 commits intozigdon:masterfrom
mmitch:fix-runaway-childs

Conversation

@mmitch
Copy link
Contributor

@mmitch mmitch commented Apr 30, 2014

This should work around the 100% CPU and/or zombie issues (for details see commit log).
It does not fix the root cause of the endless loop in the child processes, but it kill's them, so they don't live very long.

This is by no means good code, but a works-for-me with some ugly hacks.
Should probably be refined or tested by more people than just me before a merge :-)
(is there an experimental or -dev branch?)

While this does not fix the cause for child processes looping
endlessly in read()/EGAIN, it will detect these loops and kill the
runaway processes automatically.  It also prevents zombies after
killing these child processes (which happened before when you killed
them manually).

- Fix the assumption that an unresponsive child process has died.
  Until now these processes were just 'forgotten', but now it is
  checked whether the process is unresponsive but still alive (e.g. in
  an endless loop) and if alive, the process is killed before it is
  'forgotten'.

- Fix the reaping of dead child processes.  Until now, only a single
  waitpid() call was issued, which only repead one process, even if
  there were multiple processes waiting to be reaped.  Now a loop is
  used and all potential zombies should be reaped properly.
Switch to proposed 'variant 3':
Irssi::pidwait_add() already does a waitpid() for any child.
Manual calls to waitpid() and Irssi::pidwait_remove() should not be
necessary, so remove them altogether.

(I've grep(1)ed the irssi scripts directory on my Debian stable and
 none of the scripts (excepit twirssi :-) that calls pidwait_add()
 calls either pidwait_remove() or waitpid(), so this should work.)
@yarikoptic
Copy link

FWIW -- very looking forward to see the fix. @mmitch if you could share variant 3 patch I would be glad to try it out as well

@mmitch
Copy link
Contributor Author

mmitch commented May 4, 2014

@yarikoptic with commit mmitch@7e976ff variant 3 is active
It's working fine for me so far: 8 successfully killed runaway processes in the last 36 hours.

@sandebert
Copy link

I'm also affected by this bug. I'm trying the variant 3 by @mmitch now to see it that solves it for me.

@sandebert
Copy link

Reporting back: 16 runaway processes killed in the last 24 hours.

@jikuja
Copy link

jikuja commented Jul 19, 2018

Is twirssi now stable without this PR or is there something wrong with this? I have been running mmitch/twirssi@7e976ff since may 2015 without problems.

Now to get longer tweets I should update script but is it still creating runaway processes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants