Default MtlArray storage to SharedStorage by KaanKesginLW · Pull Request #820 · JuliaGPU/Metal.jl

KaanKesginLW · 2026-06-09T07:03:03Z

Summary

Change the default MtlArray storage mode from PrivateStorage to SharedStorage. On unified-memory (Apple Silicon) GPUs — essentially every supported Mac — SharedStorage is zero-copy between CPU and GPU and matches Apple's guidance. Discrete GPUs keep working (and get a one-time notice steering them to PrivateStorage).

using Metal
a = MtlArray(rand(Float32, 1024, 1024))   # SharedStorage by default now
w = unsafe_wrap(Array, a)                  # zero-copy CPU view — no Array() copy

This supersedes #717 with a clean, minimal diff (+31 / −10 across 4 files, vs #717's +655 / −606).

Motivation

Apple's guidance. For unified memory, the Metal Best Practices Guide states "the Shared mode is usually the correct choice." The previous PrivateStorage default came from discrete-GPU guidance.
All supported Macs are Apple Silicon (unified memory), so this is the right default for ~every user.
Zero-copy CPU access. SharedStorage enables unsafe_wrap(Array, x), avoiding the allocate-and-copy of Array().

Benchmarks

No performance regression (M2 Max) — SharedStorage ≈ PrivateStorage:

size	op	Shared	Private
512²	broadcast	0.057 ms	0.056 ms
512²	matmul	0.286 ms	0.291 ms
1024²	broadcast	0.135 ms	0.132 ms
1024²	matmul	0.664 ms	0.788 ms

Consistent with the original 67-benchmark sweep: 78% ties, identical for copyto! / fill! / MPS-matmul, and all ties at ≥ 512 MB.

Faster CPU access via zero-copy unsafe_wrap(Array, x) instead of Array()
(M2 Max, best of 5; both paths sum() to force a full read):

size	`Array()` + use	`unsafe_wrap` + use	speedup
512 MB	17.1 ms	9.0 ms	1.9×
1 GB	37.5 ms	18.2 ms	2.1×
2 GB	249.7 ms	36.4 ms	6.9×
4 GB	539.2 ms	72.5 ms	7.4×

(The jump at ≥ 2 GB is Array()'s GPU→CPU blit crossing into chunked copies;
unsafe_wrap stays flat because it never copies.)

Implementation

The default_storage preference now defaults to "shared". Every constructor (mtl, zeros, fill, similar, …) already resolves through DefaultStorageMode, so they follow automatically.
The default stays a compile-time const — type-stable and with no device access during precompilation. (It is not a MTLDevice(…).hasUnifiedMemory call baked into a precompiled constant.)
On the rare non-unified-memory (discrete) GPU, __init__ emits a one-time notice pointing the user to set_preferences!(Metal, "default_storage" => "private"). SharedStorage still works there; this just flags the faster option.
Also fixes a latent bug in the old error message ($default_storage → $str).

Notes

Supersedes #717, closed as "this will probably eventually happen but it'll be easier to do it in a separate PR than to rebase and clean up this one." This is that clean PR: no formatting churn, no test rewrite, just the default plus the discrete-GPU notice, docs, and a focused test.

Related: #717

On unified-memory (Apple Silicon) GPUs SharedStorage is zero-copy and matches Apple's guidance; it is also valid on discrete GPUs. Change the default_storage preference default from "private" to "shared". - src/array.jl: DefaultStorageMode defaults to "shared" (was "private"); fix the error message interpolation ($str). All constructors (mtl, zeros, fill, ...) already use DefaultStorageMode, so they follow automatically. Docstrings updated. - src/initialization.jl: on the rare non-unified-memory (discrete) GPU, warn once and point the user at the "private" preference. - docs + test/array.jl: document and test the new default. Supersedes the closed JuliaGPU#717 with a minimal, clean diff: no formatting churn, no test rewrite, and the default is decided at the preference level rather than via a device call inside a precompile-time const.

codecov · 2026-06-09T08:08:20Z

Codecov Report

❌ Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 81.30%. Comparing base (bfb9ba3) to head (12e9d2f).

Files with missing lines	Patch %	Lines
src/initialization.jl	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #820      +/-   ##
==========================================
- Coverage   81.43%   81.30%   -0.14%     
==========================================
  Files          66       66              
  Lines        3318     3321       +3     
==========================================
- Hits         2702     2700       -2     
- Misses        616      621       +5

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions

Metal Benchmarks

Details

Benchmark suite	Current: `12e9d2f`	Previous: `bfb9ba3`	Ratio
`array/accumulate/Float32/1d`	`818917` ns	`813334` ns	`1.01`
`array/accumulate/Float32/dims=1`	`1004917` ns	`982750` ns	`1.02`
`array/accumulate/Float32/dims=1L`	`10042458` ns	`10003208` ns	`1.00`
`array/accumulate/Float32/dims=2`	`1307166` ns	`1259354` ns	`1.04`
`array/accumulate/Float32/dims=2L`	`6166208.5` ns	`5629958` ns	`1.10`
`array/accumulate/Int64/1d`	`975125` ns	`950125` ns	`1.03`
`array/accumulate/Int64/dims=1`	`1107645.5` ns	`1125625.5` ns	`0.98`
`array/accumulate/Int64/dims=1L`	`11925167` ns	`12141625` ns	`0.98`
`array/accumulate/Int64/dims=2`	`1478124.5` ns	`1476416` ns	`1.00`
`array/accumulate/Int64/dims=2L`	`9481646` ns	`9438083` ns	`1.00`
`array/broadcast`	`376750` ns	`374625` ns	`1.01`
`array/construct`	`6125` ns	`5666` ns	`1.08`
`array/permutedims/2d`	`641312` ns	`630750` ns	`1.02`
`array/permutedims/3d`	`1129708` ns	`1117000` ns	`1.01`
`array/permutedims/4d`	`1992854` ns	`1994209` ns	`1.00`
`array/private/copy`	`440687.5` ns	`412292` ns	`1.07`
`array/private/copyto!/cpu_to_gpu`	`371417` ns	`368583` ns	`1.01`
`array/private/copyto!/gpu_to_cpu`	`368792` ns	`358916` ns	`1.03`
`array/private/copyto!/gpu_to_gpu`	`342917` ns	`342666` ns	`1.00`
`array/private/iteration/findall/bool`	`1076500` ns	`1073250` ns	`1.00`
`array/private/iteration/findall/int`	`1255209` ns	`1252000` ns	`1.00`
`array/private/iteration/findfirst/bool`	`1468708` ns	`1458437` ns	`1.01`
`array/private/iteration/findfirst/int`	`1508291.5` ns	`1487958` ns	`1.01`
`array/private/iteration/findmin/1d`	`1608750` ns	`1592041` ns	`1.01`
`array/private/iteration/findmin/2d`	`1310792` ns	`1315875` ns	`1.00`
`array/private/iteration/logical`	`1754354` ns	`1743542` ns	`1.01`
`array/private/iteration/scalar`	`2683833` ns	`2638375.5` ns	`1.02`
`array/random/rand/Float32`	`584458` ns	`634917` ns	`0.92`
`array/random/rand/Int64`	`697833` ns	`669834` ns	`1.04`
`array/random/rand!/Float32`	`587667` ns	`580958` ns	`1.01`
`array/random/rand!/Int64`	`508250` ns	`509000` ns	`1.00`
`array/random/randn/Float32`	`597416.5` ns	`597958` ns	`1.00`
`array/random/randn!/Float32`	`535542` ns	`531209` ns	`1.01`
`array/reductions/mapreduce/Float32/1d`	`342500` ns	`750833` ns	`0.46`
`array/reductions/mapreduce/Float32/dims=1`	`514250` ns	`499041.5` ns	`1.03`
`array/reductions/mapreduce/Float32/dims=1L`	`916395.5` ns	`780791` ns	`1.17`
`array/reductions/mapreduce/Float32/dims=2`	`511042` ns	`502750` ns	`1.02`
`array/reductions/mapreduce/Float32/dims=2L`	`1364583` ns	`1356041` ns	`1.01`
`array/reductions/mapreduce/Int64/1d`	`671750` ns	`934917` ns	`0.72`
`array/reductions/mapreduce/Int64/dims=1`	`795979.5` ns	`786666` ns	`1.01`
`array/reductions/mapreduce/Int64/dims=1L`	`1666167` ns	`1712500` ns	`0.97`
`array/reductions/mapreduce/Int64/dims=2`	`972209` ns	`966667` ns	`1.01`
`array/reductions/mapreduce/Int64/dims=2L`	`2271354.5` ns	`2260917` ns	`1.00`
`array/reductions/reduce/Float32/1d`	`340500` ns	`743333` ns	`0.46`
`array/reductions/reduce/Float32/dims=1`	`511542` ns	`499625` ns	`1.02`
`array/reductions/reduce/Float32/dims=1L`	`876250` ns	`813625` ns	`1.08`
`array/reductions/reduce/Float32/dims=2`	`509084` ns	`505833` ns	`1.01`
`array/reductions/reduce/Float32/dims=2L`	`1360042` ns	`1346625` ns	`1.01`
`array/reductions/reduce/Int64/1d`	`668500` ns	`930625` ns	`0.72`
`array/reductions/reduce/Int64/dims=1`	`800250` ns	`783875` ns	`1.02`
`array/reductions/reduce/Int64/dims=1L`	`1629125` ns	`1680125` ns	`0.97`
`array/reductions/reduce/Int64/dims=2`	`971542` ns	`980563` ns	`0.99`
`array/reductions/reduce/Int64/dims=2L`	`2272541.5` ns	`2260250` ns	`1.01`
`array/shared/copy`	`231917` ns	`238375` ns	`0.97`
`array/shared/copyto!/cpu_to_gpu`	`48083` ns	`40667` ns	`1.18`
`array/shared/copyto!/gpu_to_cpu`	`47250` ns	`40667` ns	`1.16`
`array/shared/copyto!/gpu_to_gpu`	`42333` ns	`41292` ns	`1.03`
`array/shared/iteration/findall/bool`	`1080083` ns	`1079166` ns	`1.00`
`array/shared/iteration/findall/int`	`1255333` ns	`1250333` ns	`1.00`
`array/shared/iteration/findfirst/bool`	`1199333` ns	`1192416.5` ns	`1.01`
`array/shared/iteration/findfirst/int`	`1235625` ns	`1274104.5` ns	`0.97`
`array/shared/iteration/findmin/1d`	`1343208` ns	`1282291` ns	`1.05`
`array/shared/iteration/findmin/2d`	`1326584` ns	`1266125` ns	`1.05`
`array/shared/iteration/logical`	`1601666` ns	`1594459` ns	`1.00`
`array/shared/iteration/scalar`	`8666.666666666666` ns	`5868.166666666667` ns	`1.48`
`integration/byval/reference`	`1162750` ns	`1157708` ns	`1.00`
`integration/byval/slices=1`	`1168479.5` ns	`1159208` ns	`1.01`
`integration/byval/slices=2`	`2092334` ns	`2086791.5` ns	`1.00`
`integration/byval/slices=3`	`7914208` ns	`7931979` ns	`1.00`
`integration/metaldevrt`	`489250` ns	`468520.5` ns	`1.04`
`kernel/indexing`	`370833` ns	`366500` ns	`1.01`
`kernel/indexing_checked`	`550208` ns	`540834` ns	`1.02`
`kernel/launch`	`14875` ns	`13208` ns	`1.13`
`kernel/rand`	`563916` ns	`558042` ns	`1.01`
`latency/import`	`1404497583` ns	`1399005208` ns	`1.00`
`latency/precompile`	`31277763959` ns	`31215035583.5` ns	`1.00`
`latency/ttfp`	`1710765542` ns	`1710418937.5` ns	`1.00`
`metal/synchronization/context`	`1045.8` ns	`838.258064516129` ns	`1.25`
`metal/synchronization/stream`	`696.7483443708609` ns	`436.9748743718593` ns	`1.59`

This comment was automatically generated by workflow using github-action-benchmark.

christiangnrd · 2026-06-09T16:52:00Z

#822

github-actions Bot reviewed Jun 9, 2026

View reviewed changes

christiangnrd closed this Jun 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default MtlArray storage to SharedStorage#820

Default MtlArray storage to SharedStorage#820
KaanKesginLW wants to merge 1 commit into
JuliaGPU:mainfrom
KaanKesginLW:feature/default-shared-storage

KaanKesginLW commented Jun 9, 2026

Uh oh!

codecov Bot commented Jun 9, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

christiangnrd commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

KaanKesginLW commented Jun 9, 2026

Summary

Motivation

Benchmarks

Implementation

Notes

Uh oh!

codecov Bot commented Jun 9, 2026

Codecov Report

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Metal Benchmarks

Uh oh!

christiangnrd commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants