Conversation
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
This comment has been minimized.
This comment has been minimized.
| elif isinstance(ptr, (_driver["CUdeviceptr"])): | ||
| self._cptr = <void*><void_ptr>int(ptr) |
There was a problem hiding this comment.
Yep -- it is now redundant with calling int(ptr).
There was a problem hiding this comment.
Sorry, I mean it's handled by this line, with the implicit conversion to int:
return <void *><void_ptr>ptr
| cdef void * _helper_input_void_ptr(ptr, _HelperInputVoidPtrStruct *buffer) | ||
|
|
||
| cdef inline void * _helper_input_void_ptr_free(_HelperInputVoidPtrStruct *helper): | ||
| if helper[0]._pybuffer.buf != NULL: |
There was a problem hiding this comment.
Q: Should we check first if helper is NULL?
There was a problem hiding this comment.
We could, but since this internal code only ever called from generated code, I think it's safe to skip it. Since _helper is stack-allocated, there is no malloc failing to check for.
| self._cptr = NULL | ||
| elif isinstance(ptr, (int)): | ||
| # Easy run, user gave us an already configured void** address | ||
| try: |
There was a problem hiding this comment.
It seems we can avoid code duplication by replacing the try-except block with a call to the new helper like this?
self._cptr = _helper_input_void_ptr(ptr, <_HelperInputVoidPtrStruct*><PyObject*>self)(I'm not so sure about the self casting, I think it's correct because they share the same layout.)
There was a problem hiding this comment.
_HelperInputVoidPtr is a PyObject *, but _HelperInputVoidPtrStruct is not, so they actually do have quite different layouts. The struct, since allocated on the stack, doesn't need a PyObject header for reference counting and type checking etc., which is partly why the performance hack works. I think I could probably reduce this duplication another way, however, by making _HelperInputVoidPtrStruct a member of _HelperInputVoidPtr and then passing a reference to that.
There was a problem hiding this comment.
I've updated this to reduce the code duplication.
|
/ok to test |
|
/ok to test 467f108 |
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin 12.9.x
git worktree add -d .worktree/backport-1616-to-12.9.x origin/12.9.x
cd .worktree/backport-1616-to-12.9.x
git switch --create backport-1616-to-12.9.x
git cherry-pick -x d40517a26045d3763ba41e627e3340b9bb392874 |
|
We currently accept an
int,CUdeviceptr, or a buffer-providing object as convertible to avoid *. This is currently handled with a class_HelperInputVoidPtr, which mainly exists to manage the lifetime when the input exposes a buffer.This object (like all PyObjects) is allocated on the heap and gets free'd implicitly by Cython at the end of the function. Since it only exists to manage lifetimes when the object exposes a buffer, we pay this heap allocation penalty even in the common case where the input is a simple integer.
This changes the code to statically allocate the
Py_bufferon the stack, and so is faster for similar reasons to #1545. This means we are trading some stack space (88 bytes) for speed. But given that CUDA Python API calls can't recursively call themselves, I'm not concerned.This improves the overhead time in the benchmark in #659 from 2.97us/call to 2.67us/call.
The old
_HelperInputVoidPtrclass stays around here because it is still useful when the input is a list ofvoid *-convertible things and we can't statically determine how much space to allocate.