feat(pathfinder): add CTK root canary probe for non-standard-path libs by cpcloud · Pull Request #1595 · NVIDIA/cuda-python

cpcloud · 2026-02-10T18:46:30Z

Problem

libnvvm.so lives under $CTK_ROOT/nvvm/lib64/ (or nvvm/bin on Windows), which is not on the default loader path. On bare system CTK installs, dlopen("libnvvm.so.4") can fail when CUDA_HOME/CUDA_PATH is unset even though nvvm is installed.

Solution

Keep the CTK-root canary strategy, but run the canary load in a subprocess:

Spawn python -m cuda.pathfinder._dynamic_libs.canary_probe_subprocess cudart
Child does load_with_system_search("cudart") and prints a JSON payload with the resolved absolute path (or null)
Parent derives CTK root from that path and searches for the target lib relative to that root

This avoids polluting loader state in the caller process while preserving the existing fallback behavior.

Why JSON for child-parent payload?

Encodes both outcomes explicitly: path string or null
Avoids brittle ad-hoc stdout parsing and escaping pitfalls (especially for Windows paths)
Gives a stable wire format that can be extended later without changing parsing rules

Search order

site-packages -> conda -> already-loaded -> system search -> CUDA_HOME -> subprocess canary probe

The canary still runs only after CUDA_HOME to keep existing precedence.

Tests

Updated tests/test_ctk_root_discovery.py to mock subprocess canary resolution
Added assertion that parent load_with_system_search() is not called during canary probing
Updated ordering tests so CUDA_HOME still wins over canary
Verified with pixi run test-pathfinder (129 passed, 1 skipped)

Made with Cursor

copy-pr-bot · 2026-02-10T18:46:34Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

cpcloud · 2026-02-10T18:47:20Z

/ok to test

cpcloud · 2026-02-10T18:48:23Z

/ok to test

Libraries like nvvm whose shared object lives in a subdirectory (/nvvm/lib64/) that is not on the system linker path cannot be found via bare dlopen on system CTK installs without CUDA_HOME. Add a "canary probe" search step: when direct system search fails, system-load a well-known CTK lib that IS on the linker path (cudart), derive the CTK installation root from its resolved path, and look for the target lib relative to that root via the existing anchor-point logic. The mechanism is generic -- any future lib with a non-standard path just needs its entry in _find_lib_dir_using_anchor_point. The canary probe is intentionally placed after CUDA_HOME in the search cascade to preserve backward compatibility: users who have CUDA_HOME set expect it to be authoritative, and existing code relying on that ordering should not silently change behavior. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

cpcloud · 2026-02-10T18:50:59Z

/ok to test

Co-authored-by: Cursor <cursoragent@cursor.com>

cpcloud · 2026-02-10T18:53:37Z

/ok to test

cpcloud · 2026-02-10T18:59:49Z

/ok to test

cpcloud · 2026-02-10T19:01:59Z

cuda_pathfinder/tests/test_ctk_root_discovery.py

+
+
+def test_derive_ctk_root_windows_ctk13():
+    path = r"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\bin\x64\cudart64_13.dll"


This works cross-platform due to explicit use of ntpath in _derive_ctk_root_windows. Given that the code won't look much different using the platform specific version versus not, it seems somewhat useful to have these around instead of having to skip a bunch of tests based on platform.

Copilot

Pull request overview

This PR adds a CTK root canary probe feature to the pathfinder library to resolve libraries that live in non-standard subdirectories (like libnvvm.so under $CTK_ROOT/nvvm/lib64/). The canary probe discovers the CUDA Toolkit installation root by loading a well-known library (cudart) that IS on the system linker path, deriving the CTK root from its resolved path, and then searching for the target library relative to that root.

Changes:

Adds canary probe mechanism as a last-resort fallback after CUDA_HOME in the library search cascade
Introduces CTK root derivation functions for Linux and Windows that extract installation paths from resolved library paths
Provides comprehensive test coverage (21 tests) for all edge cases and search order behavior

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
`cuda_pathfinder/tests/test_ctk_root_discovery.py`	Comprehensive test suite covering CTK root derivation, canary probe mechanism, and search order priority
`cuda_pathfinder/cuda/pathfinder/_dynamic_libs/load_nvidia_dynamic_lib.py`	Implements the canary probe function and integrates it into the library loading cascade after CUDA_HOME
`cuda_pathfinder/cuda/pathfinder/_dynamic_libs/find_nvidia_dynamic_lib.py`	Adds CTK root derivation functions and `try_via_ctk_root` method to leverage existing anchor-point search logic

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-02-10T19:11:11Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-1595/
https://nvidia.github.io/cuda-python/pr-preview/pr-1595/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-1595/cuda-bindings/
https://nvidia.github.io/cuda-python/pr-preview/pr-1595/cuda-pathfinder/
Preview will be ready when the GitHub Pages deployment is complete.

Tests that create fake CTK directory layouts were hardcoded to Linux paths (lib64/, libnvvm.so) and failed on Windows where the code expects Windows layouts (bin/, nvvm64.dll). Extract platform-aware helpers (_create_nvvm_in_ctk, _create_cudart_in_ctk, _fake_canary_path) that create the right layout and filenames based on IS_WINDOWS. Co-authored-by: Cursor <cursoragent@cursor.com>

cpcloud · 2026-02-10T19:33:34Z

/ok to test

cpcloud · 2026-02-10T19:34:50Z

/ok to test

The rel_paths for nvvm use forward slashes (e.g. "nvvm/bin") which os.path.join on Windows doesn't normalize, producing mixed-separator paths like "...\nvvm/bin\nvvm64.dll". Apply os.path.normpath to the returned directory so all separators are consistent. Co-authored-by: Cursor <cursoragent@cursor.com>

cpcloud · 2026-02-10T20:58:49Z

/ok to test

rwgk

This approach is very similar to what I had back in May 2025 while working on PR #604, but at the time @leofang was strongly opposed to it, and I backed it out.

I still believe the usefulness/pitfall factor is very high for this approach. Leo, what's your opinion now?

If Leo is supportive, I believe it'll be best to import the anchor library (cudart) in a subprocess, to not introduce potentially surprising side-effects in the current process. The original code for that was another point of contention back in May 2025 (it was using subprocess), but in the meantime I addressed those concerns and the current implementation has gone through several rounds of extensive testing (QA) without any modifications for many months. We could easily move it to cuda/pathfinder/_utils.

cpcloud · 2026-02-11T20:23:10Z

I think if there's opposition to this approach, which I believe we all discussed on the same call, then if we still care about solving the problem (I think we do), then I'd kindly request a counter-PR implementing an alternative proposal. Otherwise, we're going to get bogged down in reviewing waiting for the perfect solution.

AFAICT, there is no perfect way to solve this problem: it's just picking the least worst option.

cpcloud · 2026-02-11T20:23:58Z

The subprocess approach is not my favorite, but if we're prioritizing isolation, I don't see a less complex option for getting the canary search path.

cpcloud · 2026-02-11T20:25:59Z

One argument in favor of this approach is that the search is a last ditch effort after everything has failed, and it's backward compatible.

You might argue that canary searches are a form of system search, but I actually kept the existing priority in order to reduce the foot-bazooka of this whole thing by keeping existing installations behaving exactly the same.

rwgk · 2026-02-11T20:37:08Z

The subprocess approach is not my favorite, but if we're prioritizing isolation, I don't see a less complex option for getting the canary search path.

I think the isolation is important. If we use caching of the abs_path, the runtime hit will only be once per process.

I agree with everything else you wrote above.

Resolve CTK canary absolute paths in a spawned Python process so probing cudart does not mutate loader state in the caller process while preserving the nvvm discovery fallback order. Keep JSON as the child-to-parent wire format because it cleanly represents both path and no-result states and avoids fragile stdout/path parsing across platforms. Co-authored-by: Cursor <cursoragent@cursor.com>

Make canary subprocess path extraction explicitly typed and validated so mypy does not treat platform-specific loader results as Any while keeping probe behavior unchanged. Keep import ordering aligned with Ruff so pre-commit is green. Co-authored-by: Cursor <cursoragent@cursor.com>

cpcloud · 2026-02-11T23:06:05Z

/ok to test

cpcloud · 2026-02-11T23:08:29Z

I added the subprocess isolation.

rwgk · 2026-02-13T20:10:18Z

cuda_pathfinder/cuda/pathfinder/_dynamic_libs/load_nvidia_dynamic_lib.py

+        libname,
+    ]
+    try:
+        result = subprocess.run(  # noqa: S603


While working on the initial version of pathfinder, using subprocess was frowned upon, for general security reasons I think, so I implemented cuda_pathfinder/tests/spawned_process_runner.py. I was hoping you'd use that here.

The rest looks good to me, although I still need to spend some time carefully looking at the tests.

spawned_process_runner.py came from here (from before we had Cursor):

https://chatgpt.com/share/681914ce-f274-8008-9e9f-4538716b4ed7

There I started with:

I need to run Python code in a separate process, but I cannot use suprocess.run because that uses fork, which is incompatible with my requirements.

So I think spawned_process_runner.py will give us better isolation (and predictability), it's not just the secops concern.

I'm not sure what you mean. Can you make a patch suggestion instead of the indirection of a link to code that I then have to bring into the PR?

The goal is isolation, why doesn't subprocess run achieve that goal? The chat history is quite long, so if you want to summarize that here that'd be nice.

Also, I feel like we don't need the isolation in this PR, I will happily follow up with whatever approach makes you comfortable. Let's not get hung up on this detail. It's important, but I don't think it affects the overall design and approach.

leofang · 2026-02-17T16:44:49Z

cuda_pathfinder/cuda/pathfinder/_dynamic_libs/load_nvidia_dynamic_lib.py

+            # Canary probe: if the direct system search and CUDA_HOME both
+            # failed (e.g. nvvm isn't on the linker path and CUDA_HOME is
+            # unset), try to discover the CTK root by loading a well-known CTK
+            # lib in a subprocess, then look for the target lib relative to
+            # that root.
+            abs_path = _try_ctk_root_canary(finder)
+            if abs_path is not None:
+                found_via = "system-ctk-root"
+            else:
+                finder.raise_not_found_error()


Sorry I have been trapped in other fires and was unable to provide feedbacks timely 😢

Loading cudart in a subprocess is safer than loading it in the main process. That said, my recollection from the Feb-05 meeting was that we'd use other anchor points such as nvJitLink (basically, #1038). Does this mean we changed our mind a bit and decided to use cudart instead?

nvjitlink is fine to start with as well. I'm honestly not sure why one would be preferable over another. That said, as long as we're not choosing something super niche, it doesn't seem like it's worth spending too much time on and can be changed in a follow-up.

Thanks, Phillip. I do not have any objection.

my recollection from the Feb-05 meeting was that we'd use other anchor points such as nvJitLink (basically, #1038). Does this mean we changed our mind a bit and decided to use cudart instead?

The main idea behind 1038 is that the pivot library has two roles: 1. provide an anchor point to find other libraries from (nvvm is the only case that needs it I think), 2. more importantly actually, limit the scope of future searches (via an object that remembers the pivot library).

This PR doesn't have the concept of a search scope, it's essentially only a trick to find nvvm independently, like any other independently found library.

I was a bit surprised when Phillip sent this PR, but I think it's useful, because I believe that we'll have the independent and scoped searches side by side indefinitely. This PR will make 1038 less important, mainly for safety / internal consistency, and to guide users to a consistent setup via helpful error messages.

Re cudart vs nvjitlink:

For the scoped search, the choice of the pivot library is up to the user.

For the independent search (in an isolated process), cudart is the better choice, because it's much smaller than nvjitlink:

740 vs 96964 KiB

smc120-0009.ipp2a2.colossus.nvidia.com:/usr/local/cuda-13.1/lib64 $ (for so in *.so; do ls -s $(realpath $so); done) | sort -n 32 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libOpenCL.so.1.0.0 40 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libnvtx3interop.so.1.1.0 48 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libcufile_rdma.so.1.16.1 400 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libcuobjclient.so.1.0.0 724 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libnvblas.so.13.2.1.1 740 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libcudart.so.13.1.80 972 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libcufftw.so.12.1.0.78 1636 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libnppc.so.13.0.3.3 1640 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libnppisu.so.13.0.3.3 2424 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libnvfatbin.so.13.1.115 3380 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libcufile.so.1.16.1 4236 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libnppitc.so.13.0.3.3 4284 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libnvrtc-builtins.so.13.1.115 5784 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libnvjpeg.so.13.0.3.75 6592 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libnppim.so.13.0.3.3 6836 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libnppicc.so.13.0.3.3 8456 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libnppidei.so.13.0.3.3 10060 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libnpps.so.13.0.3.3 13112 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libnppial.so.13.0.3.3 25848 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libnppig.so.13.0.3.3 26332 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libnppist.so.13.0.3.3 52924 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libcublas.so.13.2.1.1 58192 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libnppif.so.13.0.3.3 96964 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libnvJitLink.so.13.1.115 101472 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libcusolverMg.so.12.0.9.81 111500 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libnvrtc.so.13.1.115 129928 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libcurand.so.10.4.1.81 139108 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libcusolver.so.12.0.9.81 165772 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libcusparse.so.12.7.3.1 292796 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libcufft.so.12.1.0.78 490832 /usr/local/cuda-13.1/targets/x86_64-linux/lib/libcublasLt.so.13.2.1.1

cpcloud and others added 2 commits February 10, 2026 13:49

style(pathfinder): update copyright header date in test file

44c0abd

Co-authored-by: Cursor <cursoragent@cursor.com>

cpcloud force-pushed the nvvm-discovery branch from e4066be to 44c0abd Compare February 10, 2026 18:50

refactor(pathfinder): use pytest-mock instead of unittest.mock in tests

e6f60a4

Co-authored-by: Cursor <cursoragent@cursor.com>

chore: fix typing

571d71d

cpcloud requested review from Copilot, kkraus14 and rwgk February 10, 2026 19:00

Copilot started reviewing on behalf of cpcloud February 10, 2026 19:00 View session

cpcloud commented Feb 10, 2026

View reviewed changes

Copilot AI reviewed Feb 10, 2026

View reviewed changes

chore: style

8091a6e

rwgk reviewed Feb 11, 2026

View reviewed changes

cpcloud requested a review from leofang February 12, 2026 11:51

cpcloud self-assigned this Feb 12, 2026

rwgk reviewed Feb 13, 2026

View reviewed changes

leofang reviewed Feb 17, 2026

View reviewed changes

cpcloud added this to the cuda.pathfinder 1.4.0 milestone Feb 17, 2026



		def test_derive_ctk_root_windows_ctk13():
		path = r"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\bin\x64\cudart64_13.dll"

Conversation

cpcloud commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Why JSON for child-parent payload?

Search order

Tests

Uh oh!

copy-pr-bot bot commented Feb 10, 2026

Uh oh!

cpcloud commented Feb 10, 2026

Uh oh!

cpcloud commented Feb 10, 2026

Uh oh!

cpcloud commented Feb 10, 2026

Uh oh!

cpcloud commented Feb 10, 2026

Uh oh!

cpcloud commented Feb 10, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

github-actions bot commented Feb 10, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

cpcloud commented Feb 10, 2026

Uh oh!

cpcloud commented Feb 10, 2026

Uh oh!

cpcloud commented Feb 10, 2026

Uh oh!

rwgk left a comment

Choose a reason for hiding this comment

Uh oh!

cpcloud commented Feb 11, 2026

Uh oh!

cpcloud commented Feb 11, 2026

Uh oh!

cpcloud commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rwgk commented Feb 11, 2026

Uh oh!

cpcloud commented Feb 11, 2026

Uh oh!

cpcloud commented Feb 11, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cpcloud commented Feb 10, 2026 •

edited

Loading

cpcloud commented Feb 11, 2026 •

edited

Loading