GH-49272: [C++][CI] Fix intermittent segfault in arrow-json-test on M…#49297
Open
vanshaj2023 wants to merge 3 commits intoapache:mainfrom
Open
GH-49272: [C++][CI] Fix intermittent segfault in arrow-json-test on M…#49297vanshaj2023 wants to merge 3 commits intoapache:mainfrom
vanshaj2023 wants to merge 3 commits intoapache:mainfrom
Conversation
…-json-test-segfault
…-json-test-segfault
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rationale for this change
The
arrow-json-testintermittently segfaults on AMD64 Windows MinGW CI (both CLANG64 and MINGW64 environments), causing false CI failures. The crash occurs at 0.03-0.07 seconds into test execution during the first parallel test (ReaderTest.MultipleChunksParallel). See #49272.The root cause is MinGW's
__emutlsimplementation for C++thread_local, which has known race conditions during thread creation. WhenThreadPool::LaunchWorkersUnlockedcreates a new worker thread, the thread immediately writes to thethread_local ThreadPool* current_thread_pool_variable. If__emutlshasn't finished initializing TLS for the new thread, this dereferences an invalid pointer, causing a segfault.What changes are included in this PR?
Replace
thread_localwith native Win32 TLS API on MinGW (thread_pool.cc): UsesTlsAlloc/TlsGetValue/TlsSetValueinstead ofthread_localon MinGW to bypass the buggy__emutlsemulation. Non-MinGW platforms are unchanged.Strengthen atomic memory ordering in
ThreadedTaskGroup(task_group.cc): Changednremaining_operations frommemory_order_acquire/memory_order_releasetomemory_order_acq_relas defensive hardening. This is zero-cost on x86.Add SEH crash handler for MinGW test builds (
reader_test.cc): Logs the exception code and address on crash, providing better debugging info if segfaults recur.Add
MultipleChunksParallelStresstest (reader_test.cc): Runs parallel JSON reading 20 times to help expose intermittent threading races.Are these changes tested?
Yes. A new stress test (
ReaderTest.MultipleChunksParallelStress) is added that repeatedly exercises the parallel JSON reading path. The existingReaderTest.MultipleChunksParallelandAsyncStreamingReaderTest.AsyncReentrancytests also cover the affected code paths.Are there any user-facing changes?
No.
This PR contains a "Critical Fix". The changes fix a bug that causes a crash (segfault) in
arrow-json-teston MinGW Windows CI due to a race condition in MinGW's__emutlsthread-local storage emulation during thread pool worker creation.