Skip to content

Reduce reliance on ObjC blocks#845

Merged
maleadt merged 2 commits into
mainfrom
tb/noblock
Jun 19, 2026
Merged

Reduce reliance on ObjC blocks#845
maleadt merged 2 commits into
mainfrom
tb/noblock

Conversation

@maleadt

@maleadt maleadt commented Jun 19, 2026

Copy link
Copy Markdown
Member

ObjC blocks are executed on Metal's stack, which may have a more limited stack size that could explain the crashes seen in #822 (comment) when compilation is triggered on that stack. So instead, rely on a deferred mechanism that triggers when synchronizing. This should ideally also improve launch overhead because we don't need to allocate a block anymore.

@maleadt maleadt changed the title Remove reliance on ObjC blocks Reduce reliance on ObjC blocks Jun 19, 2026
@codecov

codecov Bot commented Jun 19, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 94.73684% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.27%. Comparing base (623b6a9) to head (898b1fe).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/state.jl 92.85% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #845      +/-   ##
==========================================
+ Coverage   84.25%   84.27%   +0.01%     
==========================================
  Files          68       68              
  Lines        4110     4140      +30     
==========================================
+ Hits         3463     3489      +26     
- Misses        647      651       +4     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metal Benchmarks

Details
Benchmark suite Current: 898b1fe Previous: 623b6a9 Ratio
array/accumulate/Float32/1d 830270.5 ns 860291 ns 0.97
array/accumulate/Float32/dims=1 968833 ns 962875 ns 1.01
array/accumulate/Float32/dims=1L 9986604 ns 9867583 ns 1.01
array/accumulate/Float32/dims=2 1239042 ns 1271313 ns 0.97
array/accumulate/Float32/dims=2L 6967375 ns 6945417 ns 1.00
array/accumulate/Int64/1d 925708 ns 955625 ns 0.97
array/accumulate/Int64/dims=1 1118750 ns 1074479 ns 1.04
array/accumulate/Int64/dims=1L 11600812.5 ns 11749458 ns 0.99
array/accumulate/Int64/dims=2 1420000 ns 1436813 ns 0.99
array/accumulate/Int64/dims=2L 9570417 ns 9559125 ns 1.00
array/broadcast 361333 ns 356666.5 ns 1.01
array/construct 3292 ns 3208 ns 1.03
array/permutedims/2d 544167 ns 615417 ns 0.88
array/permutedims/3d 1063375 ns 1087833 ns 0.98
array/permutedims/4d 1735500 ns 1937500 ns 0.90
array/private/copy 384854 ns 437209 ns 0.88
array/private/copyto!/cpu_to_gpu 377792 ns 380750 ns 0.99
array/private/copyto!/gpu_to_cpu 374667 ns 357625 ns 1.05
array/private/copyto!/gpu_to_gpu 354667 ns 349500 ns 1.01
array/private/iteration/findall/bool 1056583 ns 1054667 ns 1.00
array/private/iteration/findall/int 1178833 ns 1189125 ns 0.99
array/private/iteration/findfirst/bool 1418312 ns 1319958 ns 1.07
array/private/iteration/findfirst/int 1426375 ns 1501292 ns 0.95
array/private/iteration/findmin/1d 1545541 ns 1545624.5 ns 1.00
array/private/iteration/findmin/2d 1306125 ns 1302458 ns 1.00
array/private/iteration/logical 1575834 ns 1610583.5 ns 0.98
array/private/iteration/scalar 2495458 ns 2355875 ns 1.06
array/random/rand/Float32 562729 ns 555458 ns 1.01
array/random/rand/Int64 640500 ns 672541.5 ns 0.95
array/random/rand!/Float32 522250 ns 517208 ns 1.01
array/random/rand!/Int64 491250 ns 483750 ns 1.02
array/random/randn/Float32 483083 ns 567437.5 ns 0.85
array/random/randn!/Float32 476166 ns 473583 ns 1.01
array/reductions/mapreduce/Float32/1d 718708 ns 716833 ns 1.00
array/reductions/mapreduce/Float32/dims=1 473959 ns 473333.5 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 754854 ns 742916 ns 1.02
array/reductions/mapreduce/Float32/dims=2 485916 ns 485000 ns 1.00
array/reductions/mapreduce/Float32/dims=2L 1313417 ns 1314167 ns 1.00
array/reductions/mapreduce/Int64/1d 968875 ns 924792 ns 1.05
array/reductions/mapreduce/Int64/dims=1 779729.5 ns 777479.5 ns 1.00
array/reductions/mapreduce/Int64/dims=1L 1310792 ns 1359917 ns 0.96
array/reductions/mapreduce/Int64/dims=2 964770.5 ns 956292 ns 1.01
array/reductions/mapreduce/Int64/dims=2L 2194250 ns 2194583 ns 1.00
array/reductions/reduce/Float32/1d 711250 ns 721042 ns 0.99
array/reductions/reduce/Float32/dims=1 473542 ns 471500 ns 1.00
array/reductions/reduce/Float32/dims=1L 725208 ns 729625 ns 0.99
array/reductions/reduce/Float32/dims=2 483270.5 ns 482458 ns 1.00
array/reductions/reduce/Float32/dims=2L 1292417 ns 1300937.5 ns 0.99
array/reductions/reduce/Int64/1d 911958 ns 943395.5 ns 0.97
array/reductions/reduce/Int64/dims=1 765750 ns 802312.5 ns 0.95
array/reductions/reduce/Int64/dims=1L 1294333 ns 1361771 ns 0.95
array/reductions/reduce/Int64/dims=2 925791 ns 963959 ns 0.96
array/reductions/reduce/Int64/dims=2L 2164834 ns 2175250 ns 1.00
array/shared/copy 234396 ns 236417 ns 0.99
array/shared/copyto!/cpu_to_gpu 40750 ns 39750 ns 1.03
array/shared/copyto!/gpu_to_cpu 38625 ns 41042 ns 0.94
array/shared/copyto!/gpu_to_gpu 39625 ns 41541 ns 0.95
array/shared/iteration/findall/bool 1044792 ns 1064750 ns 0.98
array/shared/iteration/findall/int 1181584 ns 1190167 ns 0.99
array/shared/iteration/findfirst/bool 1136000 ns 1131792 ns 1.00
array/shared/iteration/findfirst/int 1236750 ns 1225396 ns 1.01
array/shared/iteration/findmin/1d 1316042 ns 1323833 ns 0.99
array/shared/iteration/findmin/2d 1314333 ns 1316916.5 ns 1.00
array/shared/iteration/logical 1401291.5 ns 1459145.5 ns 0.96
array/shared/iteration/scalar 4904.714285714285 ns 4738.142857142857 ns 1.04
integration/byval/reference 1132416 ns 1138291 ns 0.99
integration/byval/slices=1 1134833 ns 1137584 ns 1.00
integration/byval/slices=2 2045250 ns 2055292 ns 1.00
integration/byval/slices=3 7496291 ns 8384542 ns 0.89
integration/metaldevrt 436875 ns 441542 ns 0.99
kernel/indexing 351895.5 ns 326042 ns 1.08
kernel/indexing_checked 529458.5 ns 521625 ns 1.02
kernel/launch 4458 ns 4833 ns 0.92
kernel/rand 526458 ns 515750 ns 1.02
latency/import 1646796458 ns 1649667041 ns 1.00
latency/precompile 36022232729 ns 35841764917 ns 1.01
latency/ttfp 1975584333 ns 1977516916 ns 1.00
metal/synchronization/context 620.9943502824859 ns 606.8121546961326 ns 1.02
metal/synchronization/stream 416.04 ns 402.7761194029851 ns 1.03

This comment was automatically generated by workflow using github-action-benchmark.

@maleadt maleadt merged commit 5df5edc into main Jun 19, 2026
19 checks passed
@maleadt maleadt deleted the tb/noblock branch June 19, 2026 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant