Use the cached task-local SYCL queue for oneMKL FFT plans#586
Merged
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #586 +/- ##
==========================================
- Coverage 80.92% 80.90% -0.02%
==========================================
Files 48 48
Lines 3234 3232 -2
==========================================
- Hits 2617 2615 -2
Misses 617 617 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
`_create_descriptor` built fresh syclDevice/syclContext/syclQueue objects for every FFT plan. Once those wrappers become garbage their finalizers (syclQueueDestroy etc.) tear down SYCL runtime state for the still-in-use underlying Level Zero queue, corrupting later DFT commits and crashing at process exit. Use the cached task-local `sycl_queue(global_queue(...))` accessor that every other oneMKL wrapper already uses, so the plan shares the managed queue lifetime instead of owning a throwaway one.
be0cf3c to
3719b89
Compare
Contributor
|
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/test/fft.jl b/test/fft.jl
index d441946..21cbc5b 100644
--- a/test/fft.jl
+++ b/test/fft.jl
@@ -80,26 +80,28 @@ end
end
end
-@testset "shared queue lifetime across plans" begin
- # Plans must share the single cached task-local SYCL queue rather than each owning a
- # throwaway one (whose finalizer would tear down shared SYCL/oneMKL state). Assert the
- # shared handle deterministically, independent of whether a stale queue would crash.
- cached_handle = Base.unsafe_convert(oneAPI.oneMKL.syclQueue_t,
- oneAPI.sycl_queue(oneAPI.global_queue(oneAPI.context(), oneAPI.device())))
+ @testset "shared queue lifetime across plans" begin
+ # Plans must share the single cached task-local SYCL queue rather than each owning a
+ # throwaway one (whose finalizer would tear down shared SYCL/oneMKL state). Assert the
+ # shared handle deterministically, independent of whether a stale queue would crash.
+ cached_handle = Base.unsafe_convert(
+ oneAPI.oneMKL.syclQueue_t,
+ oneAPI.sycl_queue(oneAPI.global_queue(oneAPI.context(), oneAPI.device()))
+ )
- dX1 = gpu(rand(ComplexF32, 8))
- p1 = AbstractFFTs.plan_fft(dX1)
- @test p1.queue == cached_handle
- dY1 = p1 * dX1
- p1i = AbstractFFTs.plan_ifft(dX1)
- p1i * dY1
+ dX1 = gpu(rand(ComplexF32, 8))
+ p1 = AbstractFFTs.plan_fft(dX1)
+ @test p1.queue == cached_handle
+ dY1 = p1 * dX1
+ p1i = AbstractFFTs.plan_ifft(dX1)
+ p1i * dY1
- GC.gc(true) # run finalizers of any throwaway per-plan SYCL wrappers
+ GC.gc(true) # run finalizers of any throwaway per-plan SYCL wrappers
- X2 = rand(ComplexF32, 8, 32)
- dX2 = gpu(X2)
- p2 = AbstractFFTs.plan_fft(dX2)
- @test p2.queue == cached_handle
- cmp(p2 * dX2, fft(X2))
-end
+ X2 = rand(ComplexF32, 8, 32)
+ dX2 = gpu(X2)
+ p2 = AbstractFFTs.plan_fft(dX2)
+ @test p2.queue == cached_handle
+ cmp(p2 * dX2, fft(X2))
+ end
end |
Add deterministic assertions to the queue-lifetime testset: every plan's stored queue handle must equal the task-local cached SYCL queue. This fails on the old throwaway-per-plan code regardless of hardware, unlike the GC-roundtrip check which only crashes on PVC-class teardown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_create_descriptorbuilt freshsyclDevice/syclContext/syclQueueobjects for every FFT plan. Once those wrappers become garbage, their finalizers (syclQueueDestroyetc.) tear down SYCL runtime state for the still-in-use underlying Level Zero queue, corrupting later DFT commits and crashing at process exit.This switches
_create_descriptorto the cached task-localsycl_queue(global_queue(...))accessor that every other oneMKL wrapper (BLAS/LAPACK/sparse) already uses, so the plan shares the managed queue lifetime instead of owning a throwaway one.Notes
Touches only the queue construction in
_create_descriptor. TheGC.@preserverooting oflengths/stridesalready present onmain(#582) is left untouched.