Skip to content

Make pymb_get_registry() thread safe#4

Draft
pablogsal wants to merge 1 commit into
hudson-trading:mainfrom
pablogsal:thread
Draft

Make pymb_get_registry() thread safe#4
pablogsal wants to merge 1 commit into
hudson-trading:mainfrom
pablogsal:thread

Conversation

@pablogsal

@pablogsal pablogsal commented Sep 11, 2025

Copy link
Copy Markdown
Member

I was preparing some discussion about this for next week and so #3 so this is a draft to address it (this issue may motivate the fact that keeping this out of CPython may be a bit more complex as users can mess with the interpreter dictionary).

It turned out to be more tricky and messy than I expected. The new version uses the 3.13+ _Ref APIs for ensuring that free-threaded builds are safe, and falls back to the older PyDict_SetDefault path otherwise. One caveat is that there are some inherit races: if someone removes the capsule from the interpreter dict manually, we can still end up with dangling state (this isn’t new, but it’s worth noting).

Marking this as a draft for now since I’ll need to do another pass with a self-review to ensure I am not missing anything subtle. Feel free to close it if you have a better approach in mind or if you don't have the bandwidth to review ATM.

Tell me if you would like to go ahead and we can discuss the approach :)

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
Comment thread pymetabind.h
Comment thread pymetabind.h
pymb_registry* registry = (struct pymb_registry*) PyCapsule_GetPointer(
existing_capsule, "pymetabind_registry");
Py_DECREF(existing_capsule);
// WARNING: The returned pointer is only valid while the interpreter

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bigger problem btw, it also highlights why maybe having some special place for this in CPython may be a good idea because is not exposed to the user messing with it

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a potential reason to remove the capsule destructor and instead have a use-count in the registry. The registry would be freed once all registered frameworks have unregistered themselves. Then removing the registry from the interpreter state dict could produce a split-brain situation (new frameworks registering after that create a new registry), but not a use-after-free. That's totally fine IMO.

(I think the issue here is not specific to free-threading. Even on a GIL build, someone could meddle with the interpreter state dict later.)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! Pablo told me to look at this project.
I'm Petr, and I prefer to solve dangling pointer issues by strong references :)

Why not make users hold the reference -- pair pymb_get_registry with a pymb_drop_registry, and keep a reference to the capsule in struct pymb_registry?

I agree that a split-brain situation is OK after messing with the interpreter state dict.

Perhaps also keep a strong ref to the registry in the framework -- between pymb_add_framework and pymb_remove_framework?
(And for user convenience, if pymb_add_framework gets a NULL registry argument, have it call pymb_get_registry and remember to drop the ref when done. Then the registry argument would only be there for “dependency injection” when if you want a “split-brain situation” -- i.e. in tests.)

Speaking of tests, how do you test pymetabind?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm fishing for quick answers here; happy to write the code & test it further if this makes sense in general)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Petr, sorry for the delay in responding - individual PR comments are easy to lose track of!

The implementation has evolved somewhat since this PR was posted, and it's now the case that the registry will be deallocated once both the capsule holding it is deallocated and it has no frameworks left. I think that's substantially equivalent to your proposed explicit refcount; as implemented, the refcount is implied by the length of the frameworks list in the registry object. We no longer expect frameworks to keep a pointer to the registry themselves; instead it is filled in by pymb_add_framework() in the new registry field of pymb_framework.

And for user convenience, if pymb_add_framework gets a NULL registry argument, have it call pymb_get_registry and remember to drop the ref when done. Then the registry argument would only be there for “dependency injection” when if you want a “split-brain situation” -- i.e. in tests.

That would be a nice way to close what is currently a very minor thread-safety hole: one thread executes pymb_add_framework(pymb_get_registry(), framework); and another deletes the registry capsule (by meddling with the interpreter state dict) in between pymb_get_registry() and pymb_add_framework(). The fix would be to make pymb_get_registry() take a strong reference to the registry capsule object and expect the caller to decref it when finished.

Speaking of tests, how do you test pymetabind?

Currently it is only tested by tests in the frameworks that use it (pybind11 and nanobind both have PRs posted, and these PRs contain tests; neither PR has been merged yet but both projects' maintainers are excited about the feature and I expect that consensus will be reached eventually). I plan to have some independent tests also, but getting the basic functionality working has been more important so far.

@oremanj oremanj left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I hadn't considered this aspect of doing cleanup via the capsule destructor. I don't think we necessarily need to be robust to others doing targeted meddling with the interpreter state dict, especially since it's not exposed to regular Python code, but if we can get that robustness without too much trouble then it's probably worthwhile.

Comment thread pymetabind.h
pymb_registry* registry = (struct pymb_registry*) PyCapsule_GetPointer(
existing_capsule, "pymetabind_registry");
Py_DECREF(existing_capsule);
// WARNING: The returned pointer is only valid while the interpreter

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a potential reason to remove the capsule destructor and instead have a use-count in the registry. The registry would be freed once all registered frameworks have unregistered themselves. Then removing the registry from the interpreter state dict could produce a split-brain situation (new frameworks registering after that create a new registry), but not a use-after-free. That's totally fine IMO.

(I think the issue here is not specific to free-threading. Even on a GIL build, someone could meddle with the interpreter state dict later.)

Comment thread pymetabind.h

if (status == -1) {
// Error occurred
Py_DECREF(new_capsule);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first thing you do on every branch is decref the new_capsule, so maybe move that out of the if?

Comment thread pymetabind.h
return existing_registry;
} else {
// status == 0: We successfully inserted our capsule
registry->capsule = new_capsule;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably do this before inserting into the dictionary so that the registry is fully usable as soon as anyone else can see it.

Comment thread pymetabind.h
return registry;
}
#else
// For older Python versions, use PyDict_SetDefault

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Older versions can't be free-threaded, and we haven't done anything that could yield the GIL, so I think this can be simplified to just SetItem.

@oremanj

oremanj commented Sep 11, 2025

Copy link
Copy Markdown
Collaborator

GH doesn't allow inline comments on non-changed lines apparently, but

 * This must be called from a module initialization function so that the
 * import lock can provide mutual exclusion.

can be removed from the doc comment for pymb_get_registry().

@oremanj

oremanj commented Sep 12, 2025

Copy link
Copy Markdown
Collaborator

fae63ac moves away from having each framework hold a reference to the registry capsule; frameworks can be destroyed in Py_AtExit handlers which is too late to destroy a pyobject.

If you rebase on that, I think it will eliminate the need for the 'WARNING:' in this PR.

See also #5 which has some fairly extensive changes to the binding removal logic. Left as a PR to make the changes easier to see; will land it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants