Default to SharedStorage by christiangnrd · Pull Request #822 · JuliaGPU/Metal.jl

christiangnrd · 2026-06-09T16:51:40Z

No description provided.

github-actions

Metal Benchmarks

Details

Benchmark suite	Current: `97ae406`	Previous: `a637d21`	Ratio
`array/accumulate/Float32/1d`	`825208` ns	`810709` ns	`1.02`
`array/accumulate/Float32/dims=1`	`1036396` ns	`1044104.5` ns	`0.99`
`array/accumulate/Float32/dims=1L`	`10566833` ns	`10212459` ns	`1.03`
`array/accumulate/Float32/dims=2`	`1316958` ns	`1321041` ns	`1.00`
`array/accumulate/Float32/dims=2L`	`5597250` ns	`5482166.5` ns	`1.02`
`array/accumulate/Int64/1d`	`1005833` ns	`971167` ns	`1.04`
`array/accumulate/Int64/dims=1`	`1159375` ns	`1195417` ns	`0.97`
`array/accumulate/Int64/dims=1L`	`12571166.5` ns	`12106667` ns	`1.04`
`array/accumulate/Int64/dims=2`	`1523833` ns	`1504958` ns	`1.01`
`array/accumulate/Int64/dims=2L`	`9164583` ns	`9371250` ns	`0.98`
`array/broadcast`	`348395.5` ns	`386166.5` ns	`0.90`
`array/construct`	`5666` ns	`5666` ns	`1`
`array/permutedims/2d`	`662458.5` ns	`629125` ns	`1.05`
`array/permutedims/3d`	`1138458` ns	`1127041.5` ns	`1.01`
`array/permutedims/4d`	`1511375` ns	`2002334` ns	`0.75`
`array/private/copy`	`424000` ns	`449291.5` ns	`0.94`
`array/private/copyto!/cpu_to_gpu`	`365125` ns	`364729` ns	`1.00`
`array/private/copyto!/gpu_to_cpu`	`348333` ns	`382334` ns	`0.91`
`array/private/copyto!/gpu_to_gpu`	`325333` ns	`358667` ns	`0.91`
`array/private/iteration/findall/bool`	`1163250` ns	`1093437.5` ns	`1.06`
`array/private/iteration/findall/int`	`1319812.5` ns	`1266208` ns	`1.04`
`array/private/iteration/findfirst/bool`	`1404750` ns	`1488416` ns	`0.94`
`array/private/iteration/findfirst/int`	`1469500` ns	`1458833` ns	`1.01`
`array/private/iteration/findmin/1d`	`1580333.5` ns	`1602833` ns	`0.99`
`array/private/iteration/findmin/2d`	`1302291.5` ns	`1331208.5` ns	`0.98`
`array/private/iteration/logical`	`1912958` ns	`1774646` ns	`1.08`
`array/private/iteration/scalar`	`1853208` ns	`2913459` ns	`0.64`
`array/random/rand/Float32`	`643792` ns	`659333` ns	`0.98`
`array/random/rand/Int64`	`692375` ns	`734333` ns	`0.94`
`array/random/rand!/Float32`	`545875` ns	`601375` ns	`0.91`
`array/random/rand!/Int64`	`498167` ns	`524458` ns	`0.95`
`array/random/randn/Float32`	`599333` ns	`621083.5` ns	`0.96`
`array/random/randn!/Float32`	`501250` ns	`550583` ns	`0.91`
`array/reductions/mapreduce/Float32/1d`	`338666` ns	`506042` ns	`0.67`
`array/reductions/mapreduce/Float32/dims=1`	`481916` ns	`523958` ns	`0.92`
`array/reductions/mapreduce/Float32/dims=1L`	`745708` ns	`767916.5` ns	`0.97`
`array/reductions/mapreduce/Float32/dims=2`	`486167` ns	`535625` ns	`0.91`
`array/reductions/mapreduce/Float32/dims=2L`	`1120333` ns	`1366000` ns	`0.82`
`array/reductions/mapreduce/Int64/1d`	`615334` ns	`979500` ns	`0.63`
`array/reductions/mapreduce/Int64/dims=1`	`781708` ns	`854459` ns	`0.91`
`array/reductions/mapreduce/Int64/dims=1L`	`1305208` ns	`1441520.5` ns	`0.91`
`array/reductions/mapreduce/Int64/dims=2`	`959854` ns	`975313` ns	`0.98`
`array/reductions/mapreduce/Int64/dims=2L`	`2242042` ns	`2238375` ns	`1.00`
`array/reductions/reduce/Float32/1d`	`336750` ns	`502209` ns	`0.67`
`array/reductions/reduce/Float32/dims=1`	`476584` ns	`524750` ns	`0.91`
`array/reductions/reduce/Float32/dims=1L`	`744896` ns	`782270.5` ns	`0.95`
`array/reductions/reduce/Float32/dims=2`	`482666` ns	`477083` ns	`1.01`
`array/reductions/reduce/Float32/dims=2L`	`1121833` ns	`1357459` ns	`0.83`
`array/reductions/reduce/Int64/1d`	`626687.5` ns	`965959` ns	`0.65`
`array/reductions/reduce/Int64/dims=1`	`793729` ns	`814250` ns	`0.97`
`array/reductions/reduce/Int64/dims=1L`	`1307729.5` ns	`1609583` ns	`0.81`
`array/reductions/reduce/Int64/dims=2`	`966000` ns	`972875` ns	`0.99`
`array/reductions/reduce/Int64/dims=2L`	`2225292` ns	`2215667` ns	`1.00`
`array/shared/copy`	`188333` ns	`244354` ns	`0.77`
`array/shared/copyto!/cpu_to_gpu`	`40500` ns	`39917` ns	`1.01`
`array/shared/copyto!/gpu_to_cpu`	`40125` ns	`40500` ns	`0.99`
`array/shared/copyto!/gpu_to_gpu`	`40958` ns	`40834` ns	`1.00`
`array/shared/iteration/findall/bool`	`1173625` ns	`1102958` ns	`1.06`
`array/shared/iteration/findall/int`	`1322708` ns	`1300875` ns	`1.02`
`array/shared/iteration/findfirst/bool`	`1122375` ns	`1196875` ns	`0.94`
`array/shared/iteration/findfirst/int`	`1206187.5` ns	`1225417` ns	`0.98`
`array/shared/iteration/findmin/1d`	`1342354` ns	`1347667` ns	`1.00`
`array/shared/iteration/findmin/2d`	`1303041.5` ns	`1337708` ns	`0.97`
`array/shared/iteration/logical`	`1770375` ns	`1627500` ns	`1.09`
`array/shared/iteration/scalar`	`6158.4` ns	`5777.833333333333` ns	`1.07`
`integration/byval/reference`	`1182417` ns	`1169916` ns	`1.01`
`integration/byval/slices=1`	`1184166` ns	`1170583` ns	`1.01`
`integration/byval/slices=2`	`2128666` ns	`2096042` ns	`1.02`
`integration/byval/slices=3`	`18881854` ns	`8003333` ns	`2.36`
`integration/metaldevrt`	`474042` ns	`502500` ns	`0.94`
`kernel/indexing`	`352625` ns	`379084` ns	`0.93`
`kernel/indexing_checked`	`510875` ns	`563583` ns	`0.91`
`kernel/launch`	`13458` ns	`13375` ns	`1.01`
`kernel/rand`	`528500` ns	`576083.5` ns	`0.92`
`latency/import`	`1521048500` ns	`1500519500` ns	`1.01`
`latency/precompile`	`32325091520.5` ns	`32065662458` ns	`1.01`
`latency/ttfp`	`1836443584` ns	`1812250688` ns	`1.01`
`metal/synchronization/context`	`717.8137931034482` ns	`685.126582278481` ns	`1.05`
`metal/synchronization/stream`	`480.015306122449` ns	`459.81218274111677` ns	`1.04`

This comment was automatically generated by workflow using github-action-benchmark.

codecov · 2026-06-09T22:02:55Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.36%. Comparing base (a637d21) to head (97ae406).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #822      +/-   ##
==========================================
- Coverage   83.42%   83.36%   -0.06%     
==========================================
  Files          67       67              
  Lines        3669     3668       -1     
==========================================
- Hits         3061     3058       -3     
- Misses        608      610       +2

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

christiangnrd · 2026-06-18T12:58:45Z

Could the github actions failure be related to the compilation failures?

maleadt · 2026-06-19T08:42:41Z

I'm not sure how. There's a known issue with ObjC errors leaking outside of their retain/release scope and crashing during error reporting, but that's read-only and shouldn't cause a crash in LLVM.

maleadt · 2026-06-19T09:14:09Z

I guess the remaining question is semantics. Do we want to allow scalar iteration on shard memory so that it can be used with CPU code? It'll never be as fast as Array, but if we port the dirty memory flag tracking from CUDA.jl we can get it down to a couple of ns (as opposed to ~1ns for an Array getindex or so). If we want to add back scalar iteration checking we add another couple of ns for the TLS check on every access.

christiangnrd · 2026-06-19T12:46:03Z

i.e. unsafe_wrap(Array, ... wouldn't be needed?

maleadt · 2026-06-19T13:04:33Z

Right, that's the current design of CUDA.jl:

Precompiling CUDA finished.
  12 dependencies successfully precompiled in 43 seconds. 90 already precompiled.

julia> CUDA.allowscalar(false)

julia> a = cu([1])
1-element CuArray{Int64, 1, CUDACore.DeviceMemory}:
 1

julia> a[]
ERROR: Scalar indexing is disallowed.

julia> b = cu([1]; unified=true)
1-element CuArray{Int64, 1, CUDACore.UnifiedMemory}:
 1

julia> b[]
1

I'm not entirely convinced this is the best option though. It makes sense, but people often use allowscalar for detecting GPU code. We could tell them they have to use private memory for that, but it still makes allowscalar(false) a lie.

christiangnrd · 2026-06-19T13:10:27Z

This is also how shared storage currently works

christiangnrd · 2026-06-19T13:13:38Z

But without the dirty flag so there might be some latent race conditions with shared MtlArrays

maleadt · 2026-06-19T16:13:13Z

This is also how shared storage currently works

Right, but it never was the default. It would break users doing allowscalar(false) to detect GPU functionality execution on the CPU.

And we definitely need the dirty flag to improve performance here. But that can be follow-up work.

christiangnrd · 2026-06-19T16:32:11Z

What of we add a @warn when they call allowscalar the first time if default storage mode is shared?

maleadt · 2026-06-19T16:45:37Z

I guess that could work, but the interface is not extensible like that right now.

EDIT: I guess we could implement Metal.allowscalar, have it warn, and then call GPUArrays.allowscalar.

christiangnrd mentioned this pull request Jun 9, 2026

Default MtlArray storage to SharedStorage #820

Closed

github-actions Bot reviewed Jun 9, 2026

View reviewed changes

christiangnrd force-pushed the shared branch from 062a895 to 7f83fb0 Compare June 9, 2026 18:44

christiangnrd commented Jun 9, 2026

View reviewed changes

Comment thread src/array.jl Outdated

christiangnrd force-pushed the shared branch 2 times, most recently from db53574 to fdf7a32 Compare June 16, 2026 22:49

christiangnrd added 6 commits June 18, 2026 09:36

Default to SharedStorage

bfde5c4

More

713d307

CI

9769fe0

Fix KA

162fc27

Apply suggestion from @christiangnrd

29ecf38

Fix

97ae406

christiangnrd force-pushed the shared branch from fdf7a32 to 97ae406 Compare June 18, 2026 12:36

maleadt mentioned this pull request Jun 19, 2026

Reduce reliance on ObjC blocks #845

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default to SharedStorage#822

Default to SharedStorage#822
christiangnrd wants to merge 6 commits into
mainfrom
shared

christiangnrd commented Jun 9, 2026

Uh oh!

github-actions Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

codecov Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

christiangnrd commented Jun 18, 2026

Uh oh!

maleadt commented Jun 19, 2026

Uh oh!

maleadt commented Jun 19, 2026

Uh oh!

christiangnrd commented Jun 19, 2026

Uh oh!

maleadt commented Jun 19, 2026

Uh oh!

christiangnrd commented Jun 19, 2026

Uh oh!

christiangnrd commented Jun 19, 2026

Uh oh!

maleadt commented Jun 19, 2026

Uh oh!

christiangnrd commented Jun 19, 2026

Uh oh!

maleadt commented Jun 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

christiangnrd commented Jun 9, 2026

Uh oh!

github-actions Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Metal Benchmarks

Uh oh!

Uh oh!

codecov Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

christiangnrd commented Jun 18, 2026

Uh oh!

maleadt commented Jun 19, 2026

Uh oh!

maleadt commented Jun 19, 2026

Uh oh!

christiangnrd commented Jun 19, 2026

Uh oh!

maleadt commented Jun 19, 2026

Uh oh!

christiangnrd commented Jun 19, 2026

Uh oh!

christiangnrd commented Jun 19, 2026

Uh oh!

maleadt commented Jun 19, 2026

Uh oh!

christiangnrd commented Jun 19, 2026

Uh oh!

maleadt commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot left a comment •

edited

Loading

codecov Bot commented Jun 9, 2026 •

edited

Loading

maleadt commented Jun 19, 2026 •

edited

Loading