Skip to content

Conversation

@CodyCBakerPhD
Copy link
Contributor

@CodyCBakerPhD CodyCBakerPhD commented Oct 17, 2025

Fixes #82 [edit asmacdo]

I tried running the test suite on Mac and encountered many failures. This is because for some reason, the output for small values seems to get mapped to null [edit: they were actually not being captured at all from differences in the OS-dependent ps command]

There was also lots of terminal spam about ps not being called correctly so I fixed that as well. Did not seem to actually alter the runner of duct however, just wasn't nice to look at [edit: this was the source of the issue]

Some other minor comments have been left on lines below

@CodyCBakerPhD CodyCBakerPhD self-assigned this Oct 17, 2025
@mergify
Copy link

mergify bot commented Oct 17, 2025

🧪 CI Insights

Here's what we observed from your CI run for 1c3f3ee.

🟢 All jobs passed!

But CI Insights is watching 👀

@CodyCBakerPhD
Copy link
Contributor Author

This is because for some reason, the output for small values seems to get mapped to null

Of course, this seems to be because it is not working, will continue working on this

@CodyCBakerPhD
Copy link
Contributor Author

🟢 All jobs passed!

lol

@yarikoptic
Copy link
Member

this is the issue it would close and which would have some more details:

@CodyCBakerPhD CodyCBakerPhD force-pushed the fix_mac branch 3 times, most recently from 5941e46 to 76c47bc Compare October 24, 2025 03:16
restore original linux version for tests

adjust CI for apt-get

remove redundant assignment

add system display just for sanity

move sanity printout to test and remove args keyword

remove extra summary check

feat: debugged core logic

fix: skip a couple linux-specific tests

fix: fix skipif

feat: final touches
@CodyCBakerPhD CodyCBakerPhD marked this pull request as ready for review October 24, 2025 03:41
@CodyCBakerPhD
Copy link
Contributor Author

This is now passing on all GitHub Mac runners (and my device)

The failing CI seems to be particular environment setup issue: https://github.com/con/duct/actions/runs/18768853617/job/53549456968?pr=325 @asmacdo any ideas? I will mention I skip that step in the workflow on all Mac runners since they run into the same exact issue (assuming something about tox setup is tuned specifically to linux)

@CodyCBakerPhD CodyCBakerPhD requested a review from asmacdo October 24, 2025 03:45
@asmacdo
Copy link
Member

asmacdo commented Oct 27, 2025

This is now passing on all GitHub Mac runners (and my device)

The failing CI seems to be particular environment setup issue: https://github.com/con/duct/actions/runs/18768853617/job/53549456968?pr=325 @asmacdo any ideas? I will mention I skip that step in the workflow on all Mac runners since they run into the same exact issue (assuming something about tox setup is tuned specifically to linux)

@CodyCBakerPhD that failure is not related to this PR, and is also failing in the daily tests starting last week.
#330

@asmacdo asmacdo added the semver-minor Increment the minor version when merged label Oct 27, 2025
@asmacdo
Copy link
Member

asmacdo commented Oct 29, 2025

@CodyCBakerPhD they are in the captured logs.

 ERROR    test_e2e:test_e2e.py:48 Test parameters: mode=subshell, num_children=2
ERROR    test_e2e:test_e2e.py:49 Expected count: 3, Actual count: 0
ERROR    test_e2e:test_e2e.py:50 Total samples collected: 1
ERROR    test_e2e:test_e2e.py:55 Sample 0: timestamp=2025-10-29T15:06:04.183884+00:00, total_processes=1
ERROR    test_e2e:test_e2e.py:69   No sleep processes found in sample 0
ERROR    test_e2e:test_e2e.py:70   All processes in sample 0:
ERROR    test_e2e:test_e2e.py:72     PID 69735: cmd='/bin/bash /Users/runner/work/duct/duct/test/data/spawn_children.sh subshell 2 0.3'
ERROR    test_e2e:test_e2e.py:74 All unique sleep PIDs found: []

Its surprising that with a sample coming in every 0.01 seconds (aggregated every 0.1 seconds) that we only got 1 aggregated sample and it contained none of the child processes. I'll try bumping the time way up to see if its just a race condition or something else.

@CodyCBakerPhD
Copy link
Contributor Author

they are in the captured logs.

I see some at https://github.com/con/duct/actions/runs/18912288068/job/53986067528?pr=325#step:6:404 but only for 3.14 (which I have honestly given up on for now since it is the most problematic)

I was talking about the other parts of the testing matrix

@CodyCBakerPhD
Copy link
Contributor Author

And for the rest of the matrix,

But anyway the biggest issue ATM seems to be that Mac loves to occassionally return -2 for signal killed; would it be undesireable to just add a bit of logic to the signal handler for Mac to map that to zero?

@asmacdo
Copy link
Member

asmacdo commented Oct 29, 2025

For the tmp debugging it only logs when the assertion will fail which is why it didnt show up elsewhere

The time bump proves its just a race condition. The remaining question is why is it so slow, is it due to something specific to the runner or is duct just super slow on mac?

We should try to bring that test duration number down a bit, but at least we now know its working!

For Signal exits, we probably need to dig into that more. If the exit code isnt 0 that is a real problem I think. The expected behavior is that duct intercepts the signal and passes to the child process. When the child is killed, duct exits normally with 0. So -2 -> 0 seems wrong to me. Do you happen to know if signal processing is handled differently on macs?

@asmacdo
Copy link
Member

asmacdo commented Oct 29, 2025

It could be that the signal tests are also sensitive to race conditions. If the process startup is extremely slow, its possible that the signal handler doesn't get registered in time, which would explain negative error codes. I still cant explain the error code of 1 though for the signal test.

We are going to pause on this for the moment until the combine-cli PR is merged. After rebase, @CodyCBakerPhD and I agreed to drop macos-intel part of the matrix to get this in-- though duct is probably working on those systems. After adding config files, we can add a separate config for the mac intel envs which would run slower tests, and hopefully avoid the race conditions.

@asmacdo asmacdo marked this pull request as draft October 29, 2025 18:07
@yarikoptic
Copy link
Member

the output of that ps which lead to an empty sample is at https://www.oneukrainian.com/tmp/ps-mac-output.txt
. The question is -- what should in general happen if no sample was collected there -- exception like now (sounds right if we need to fix smth) or just skip?

@CodyCBakerPhD
Copy link
Contributor Author

I added a minimal retry mechanism for empty samples that can no doubt be improved in the future

Before I disabled intel-based tests, they seemed to be failing a bit more reliably on the 'sanity green' check as well as the sigint returns (either 1 or -2 instead of 0, usually for parameter cases [0] and [3.14])

NOTE: I finally saw it occur in ubuntu as well (https://github.com/con/duct/actions/runs/19088640558/job/54534324996?pr=325#step:6:441) perhaps it is simply a random occurence in the test suite in all platforms with ubuntu having much lower rates of occurence than macos-intel

I propose we deal with that in the follow-up (#335), either acknowledging that to be an acceptable behavior on that architecture and adjusting the tests correspondingly or someone attempting to dig further into the 'issue' than I can with my devices

For now I am proceeding under the guide that only M-series 'latest' macs are guaranteed to work as intended (all macos-latest have been stable for the last few runs)

@CodyCBakerPhD
Copy link
Contributor Author

CodyCBakerPhD commented Nov 5, 2025

(forgive the constant small pushing BTW - it is the easiest way I have of triggering retries to guarantee randomness is 'gone' since I don't have permissions on the CI)

@CodyCBakerPhD CodyCBakerPhD marked this pull request as ready for review November 5, 2025 02:21
@CodyCBakerPhD CodyCBakerPhD requested a review from asmacdo November 5, 2025 02:21
@CodyCBakerPhD CodyCBakerPhD changed the title Added Mac support Added Mac (M-series) support Nov 5, 2025
@CodyCBakerPhD
Copy link
Contributor Author

lol TBH probably just easier for me at this point to open a new PR - will do that sometime this weekend

@CodyCBakerPhD CodyCBakerPhD marked this pull request as draft December 20, 2025 04:00
@CodyCBakerPhD
Copy link
Contributor Author

Re-started in #351

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

semver-minor Increment the minor version when merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Portability issue: ps on macOS does not have -s option

3 participants