Make pymb_get_registry() thread safe#4
Conversation
Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
| pymb_registry* registry = (struct pymb_registry*) PyCapsule_GetPointer( | ||
| existing_capsule, "pymetabind_registry"); | ||
| Py_DECREF(existing_capsule); | ||
| // WARNING: The returned pointer is only valid while the interpreter |
There was a problem hiding this comment.
This is a bigger problem btw, it also highlights why maybe having some special place for this in CPython may be a good idea because is not exposed to the user messing with it
There was a problem hiding this comment.
That seems like a potential reason to remove the capsule destructor and instead have a use-count in the registry. The registry would be freed once all registered frameworks have unregistered themselves. Then removing the registry from the interpreter state dict could produce a split-brain situation (new frameworks registering after that create a new registry), but not a use-after-free. That's totally fine IMO.
(I think the issue here is not specific to free-threading. Even on a GIL build, someone could meddle with the interpreter state dict later.)
There was a problem hiding this comment.
Hello! Pablo told me to look at this project.
I'm Petr, and I prefer to solve dangling pointer issues by strong references :)
Why not make users hold the reference -- pair pymb_get_registry with a pymb_drop_registry, and keep a reference to the capsule in struct pymb_registry?
I agree that a split-brain situation is OK after messing with the interpreter state dict.
Perhaps also keep a strong ref to the registry in the framework -- between pymb_add_framework and pymb_remove_framework?
(And for user convenience, if pymb_add_framework gets a NULL registry argument, have it call pymb_get_registry and remember to drop the ref when done. Then the registry argument would only be there for “dependency injection” when if you want a “split-brain situation” -- i.e. in tests.)
Speaking of tests, how do you test pymetabind?
There was a problem hiding this comment.
(I'm fishing for quick answers here; happy to write the code & test it further if this makes sense in general)
There was a problem hiding this comment.
Hi Petr, sorry for the delay in responding - individual PR comments are easy to lose track of!
The implementation has evolved somewhat since this PR was posted, and it's now the case that the registry will be deallocated once both the capsule holding it is deallocated and it has no frameworks left. I think that's substantially equivalent to your proposed explicit refcount; as implemented, the refcount is implied by the length of the frameworks list in the registry object. We no longer expect frameworks to keep a pointer to the registry themselves; instead it is filled in by pymb_add_framework() in the new registry field of pymb_framework.
And for user convenience, if pymb_add_framework gets a NULL registry argument, have it call pymb_get_registry and remember to drop the ref when done. Then the registry argument would only be there for “dependency injection” when if you want a “split-brain situation” -- i.e. in tests.
That would be a nice way to close what is currently a very minor thread-safety hole: one thread executes pymb_add_framework(pymb_get_registry(), framework); and another deletes the registry capsule (by meddling with the interpreter state dict) in between pymb_get_registry() and pymb_add_framework(). The fix would be to make pymb_get_registry() take a strong reference to the registry capsule object and expect the caller to decref it when finished.
Speaking of tests, how do you test pymetabind?
Currently it is only tested by tests in the frameworks that use it (pybind11 and nanobind both have PRs posted, and these PRs contain tests; neither PR has been merged yet but both projects' maintainers are excited about the feature and I expect that consensus will be reached eventually). I plan to have some independent tests also, but getting the basic functionality working has been more important so far.
oremanj
left a comment
There was a problem hiding this comment.
Thanks, I hadn't considered this aspect of doing cleanup via the capsule destructor. I don't think we necessarily need to be robust to others doing targeted meddling with the interpreter state dict, especially since it's not exposed to regular Python code, but if we can get that robustness without too much trouble then it's probably worthwhile.
| pymb_registry* registry = (struct pymb_registry*) PyCapsule_GetPointer( | ||
| existing_capsule, "pymetabind_registry"); | ||
| Py_DECREF(existing_capsule); | ||
| // WARNING: The returned pointer is only valid while the interpreter |
There was a problem hiding this comment.
That seems like a potential reason to remove the capsule destructor and instead have a use-count in the registry. The registry would be freed once all registered frameworks have unregistered themselves. Then removing the registry from the interpreter state dict could produce a split-brain situation (new frameworks registering after that create a new registry), but not a use-after-free. That's totally fine IMO.
(I think the issue here is not specific to free-threading. Even on a GIL build, someone could meddle with the interpreter state dict later.)
|
|
||
| if (status == -1) { | ||
| // Error occurred | ||
| Py_DECREF(new_capsule); |
There was a problem hiding this comment.
The first thing you do on every branch is decref the new_capsule, so maybe move that out of the if?
| return existing_registry; | ||
| } else { | ||
| // status == 0: We successfully inserted our capsule | ||
| registry->capsule = new_capsule; |
There was a problem hiding this comment.
We should probably do this before inserting into the dictionary so that the registry is fully usable as soon as anyone else can see it.
| return registry; | ||
| } | ||
| #else | ||
| // For older Python versions, use PyDict_SetDefault |
There was a problem hiding this comment.
Older versions can't be free-threaded, and we haven't done anything that could yield the GIL, so I think this can be simplified to just SetItem.
|
GH doesn't allow inline comments on non-changed lines apparently, but can be removed from the doc comment for pymb_get_registry(). |
|
fae63ac moves away from having each framework hold a reference to the registry capsule; frameworks can be destroyed in Py_AtExit handlers which is too late to destroy a pyobject. If you rebase on that, I think it will eliminate the need for the 'WARNING:' in this PR. See also #5 which has some fairly extensive changes to the binding removal logic. Left as a PR to make the changes easier to see; will land it soon. |
I was preparing some discussion about this for next week and so #3 so this is a draft to address it (this issue may motivate the fact that keeping this out of CPython may be a bit more complex as users can mess with the interpreter dictionary).
It turned out to be more tricky and messy than I expected. The new version uses the 3.13+ _Ref APIs for ensuring that free-threaded builds are safe, and falls back to the older
PyDict_SetDefaultpath otherwise. One caveat is that there are some inherit races: if someone removes the capsule from the interpreter dict manually, we can still end up with dangling state (this isn’t new, but it’s worth noting).Marking this as a draft for now since I’ll need to do another pass with a self-review to ensure I am not missing anything subtle. Feel free to close it if you have a better approach in mind or if you don't have the bandwidth to review ATM.
Tell me if you would like to go ahead and we can discuss the approach :)