gh-144569: Avoid creating temporary objects in BINARY_SLICE for list, tuple, and unicode#144590
gh-144569: Avoid creating temporary objects in BINARY_SLICE for list, tuple, and unicode#144590cocolato wants to merge 15 commits intopython:mainfrom
BINARY_SLICE for list, tuple, and unicode#144590Conversation
This comment was marked as outdated.
This comment was marked as outdated.
|
You forgot to add the new recorder to BINARY_SLICE macro op |
Fidget-Spinner
left a comment
There was a problem hiding this comment.
lgtm just one comment and add news please
|
Actually I've changed my mind. Can you please open a PR to convert BINARY_SLICE to the POP_TOP form please? Then after we merge that PR we can update this one |
Fidget-Spinner
left a comment
There was a problem hiding this comment.
Let's leave the DECREF_INPUTS alone for now and try to remove it in a future PR.
The code has repeated instances that should be factored out to a micro-op
Misc/NEWS.d/next/Core_and_Builtins/2026-02-08-13-14-00.gh-issue-144569.pjlJVe.rst
Outdated
Show resolved
Hide resolved
|
This is pretty close, I'm going to push some changes in, then wait a day to see if Mark has any objections tomorrow. |
|
Thanks for your guidance! |
1. We cannot exit after unpacking indices, as the stack contains tagged ints now which may lead to a crash. We must insert a guard for the type before unpacking indices. 2. It is possible for an indice to not fit in a tagged int, in which case we must deopt. 3. Recorded values do not always mean that that's the actual type we will see at runtime. So we must guard on that as well.
|
@cocolato please check out the latest commit and the extended commit message for the bugs fixed. Thanks! |
|
Results using hyperfine: So we made it >10% faster. Nice! |
LGTM! Thanks again for the fix, I learned a lot from it.
This is my first time learning about this tool, and it looks great. In the past, when I needed to run benchmarks locally, |
This comment was marked as resolved.
This comment was marked as resolved.
markshannon
left a comment
There was a problem hiding this comment.
Thanks for doing this.
This looks sound, but there are some inefficiencies.
Beware the size of uops. They will be converted to machine code, so keeping code size down is important for performance.
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
BINARY_SLICE for list, tuple, and unicodeBINARY_SLICE for list, tuple, and unicode
|
Thank you for your review and corrections! I have made the requested changes; please review again. |
|
Thanks for making the requested changes! @markshannon: please review the changes made to this pull request. |
|
Sorry for the delay in getting back to this. I don't think we want I think we should spilt this into two PRs: the first with just the interpreter changes (bytecodes.c), leaving the JIT optimizations for another PR. To make it worth splitting the operation into uops we need to be able to optimize the uops effectively. That would mean handling the index adjustments in the optimizer as much as possible and leaving only minimal checking to the runtime. For example, take Which looks bulky, but each of the uops is only a few machine instructions ( (In the above So, let's keep it simple for the first PR, and just add special cases for lists, tuples and strings in |
| else { | ||
| res_o = PyObject_GetItem(PyStackRef_AsPyObjectBorrow(container), slice); | ||
| Py_DECREF(slice); | ||
| PyObject *slice = PySlice_New(start_o, stop_o, NULL); |
There was a problem hiding this comment.
To reduce redundant code, we've modified this to call PySlice_New to create the slice. This incurs a slight performance overhead; if needed, I can revert to the original implementation.
|
@markshannon Thanks for review! I have reimplemented the fast path for |
Optimize
BINARY_SLICEforlist,tuple, andunicodeobject with int/None indices.This is the first step for
BINARY_SLICEoptimize. We will implement more jit optimizations after this PR.