Skip to content

GH-49272: [C++][CI] Fix intermittent segfault in arrow-json-test on M…#49297

Open
vanshaj2023 wants to merge 3 commits intoapache:mainfrom
vanshaj2023:fix-mingw-json-test-segfault
Open

GH-49272: [C++][CI] Fix intermittent segfault in arrow-json-test on M…#49297
vanshaj2023 wants to merge 3 commits intoapache:mainfrom
vanshaj2023:fix-mingw-json-test-segfault

Conversation

@vanshaj2023
Copy link

@vanshaj2023 vanshaj2023 commented Feb 16, 2026

Rationale for this change

The arrow-json-test intermittently segfaults on AMD64 Windows MinGW CI (both CLANG64 and MINGW64 environments), causing false CI failures. The crash occurs at 0.03-0.07 seconds into test execution during the first parallel test (ReaderTest.MultipleChunksParallel). See #49272.

The root cause is MinGW's __emutls implementation for C++ thread_local, which has known race conditions during thread creation. When ThreadPool::LaunchWorkersUnlocked creates a new worker thread, the thread immediately writes to the thread_local ThreadPool* current_thread_pool_ variable. If __emutls hasn't finished initializing TLS for the new thread, this dereferences an invalid pointer, causing a segfault.

What changes are included in this PR?

  1. Replace thread_local with native Win32 TLS API on MinGW (thread_pool.cc): Uses TlsAlloc/TlsGetValue/TlsSetValue instead of thread_local on MinGW to bypass the buggy __emutls emulation. Non-MinGW platforms are unchanged.

  2. Strengthen atomic memory ordering in ThreadedTaskGroup (task_group.cc): Changed nremaining_ operations from memory_order_acquire/memory_order_release to memory_order_acq_rel as defensive hardening. This is zero-cost on x86.

  3. Add SEH crash handler for MinGW test builds (reader_test.cc): Logs the exception code and address on crash, providing better debugging info if segfaults recur.

  4. Add MultipleChunksParallelStress test (reader_test.cc): Runs parallel JSON reading 20 times to help expose intermittent threading races.

Are these changes tested?

Yes. A new stress test (ReaderTest.MultipleChunksParallelStress) is added that repeatedly exercises the parallel JSON reading path. The existing ReaderTest.MultipleChunksParallel and AsyncStreamingReaderTest.AsyncReentrancy tests also cover the affected code paths.

Are there any user-facing changes?

No.

This PR contains a "Critical Fix". The changes fix a bug that causes a crash (segfault) in arrow-json-test on MinGW Windows CI due to a race condition in MinGW's __emutls thread-local storage emulation during thread pool worker creation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant