gh-144981: Make PyUnstable_Code_SetExtra/GetExtra thread-safe#144980
gh-144981: Make PyUnstable_Code_SetExtra/GetExtra thread-safe#144980yoney wants to merge 3 commits intopython:mainfrom
Conversation
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
|
I benchmarked both the base and new implementations on GIL builds. Since Note: Benchmarks were run on an x86_64 (Skylake) system. Base (GIL) New Implementation (GIL) |
|
How are these used in practice? Can we leave For example, something like:
Would that work? |
|
Oh, the stop the world on resize would be in |
|
After the refactoring benchmark:
Note: Benchmarks were run on an x86_64 (Skylake) system. Base (GIL) New Implementation (GIL) New Implementation (FT) |
For Cinderx, this is used to count the number of calls before making a JIT decision. We implemented a workaround here: facebookincubator/cinderx@26f3e2b |
|
I have made the requested changes; please review again. |
|
Thanks for making the requested changes! @DinoV: please review the changes made to this pull request. |
|
aarch64 benchmark run:
Base (GIL) New Implementation (GIL) New Implementation (FT) |
I think stop the world on growth would probably be too expensive. If it was on every initial allocation it would certainly be way too expensive. But even if it was on an actual resize it might be really painful if someone added a new index as there'd be a burst of stop the world events as all the code objects got updated (as usually consumers are going to be attaching to a lot of objects). But I do think we could make the callers perform whatever synchronization they need to protect their own indexes - e.g. maybe they need to construct a (cheap and empty) value to be filled in and tolerate throwing one of those away and then do a get to resolve the race. But I also think the minimal synchronization in the latest version is fine. |
PyUnstable_Code_GetExtraandPyUnstable_Code_SetExtrahad no thread-safety. Concurrent calls could cause data races.GetExtrais performance-sensitive for use cases like JIT. It uses a lock-free acquire load ofco_extra, and acquire loads of individual slots.SetExtrauses a per-object critical section. When the slot already exists, it updates in place with a release store. When the buffer needs to grow, the free-threaded build allocates a new buffer, copies existing slots, publishes with a release store, and reclaims the old buffer via_PyMem_FreeDelayed(QSBR). The GIL build simply uses realloc.PyUnstable_Eval_RequestCodeExtraIndexis now protected byinterp->code_state.mutex.RequestCodeExtraIndex,SetExtra, andGetExtra.The new tests exercise the concurrent paths and a free-threaded TSAN build without the fix crashes/emits warnings.
cc: @DinoV @colesbury