Default to SharedStorage#822
Conversation
There was a problem hiding this comment.
Metal Benchmarks
Details
| Benchmark suite | Current: 97ae406 | Previous: a637d21 | Ratio |
|---|---|---|---|
array/accumulate/Float32/1d |
825208 ns |
810709 ns |
1.02 |
array/accumulate/Float32/dims=1 |
1036396 ns |
1044104.5 ns |
0.99 |
array/accumulate/Float32/dims=1L |
10566833 ns |
10212459 ns |
1.03 |
array/accumulate/Float32/dims=2 |
1316958 ns |
1321041 ns |
1.00 |
array/accumulate/Float32/dims=2L |
5597250 ns |
5482166.5 ns |
1.02 |
array/accumulate/Int64/1d |
1005833 ns |
971167 ns |
1.04 |
array/accumulate/Int64/dims=1 |
1159375 ns |
1195417 ns |
0.97 |
array/accumulate/Int64/dims=1L |
12571166.5 ns |
12106667 ns |
1.04 |
array/accumulate/Int64/dims=2 |
1523833 ns |
1504958 ns |
1.01 |
array/accumulate/Int64/dims=2L |
9164583 ns |
9371250 ns |
0.98 |
array/broadcast |
348395.5 ns |
386166.5 ns |
0.90 |
array/construct |
5666 ns |
5666 ns |
1 |
array/permutedims/2d |
662458.5 ns |
629125 ns |
1.05 |
array/permutedims/3d |
1138458 ns |
1127041.5 ns |
1.01 |
array/permutedims/4d |
1511375 ns |
2002334 ns |
0.75 |
array/private/copy |
424000 ns |
449291.5 ns |
0.94 |
array/private/copyto!/cpu_to_gpu |
365125 ns |
364729 ns |
1.00 |
array/private/copyto!/gpu_to_cpu |
348333 ns |
382334 ns |
0.91 |
array/private/copyto!/gpu_to_gpu |
325333 ns |
358667 ns |
0.91 |
array/private/iteration/findall/bool |
1163250 ns |
1093437.5 ns |
1.06 |
array/private/iteration/findall/int |
1319812.5 ns |
1266208 ns |
1.04 |
array/private/iteration/findfirst/bool |
1404750 ns |
1488416 ns |
0.94 |
array/private/iteration/findfirst/int |
1469500 ns |
1458833 ns |
1.01 |
array/private/iteration/findmin/1d |
1580333.5 ns |
1602833 ns |
0.99 |
array/private/iteration/findmin/2d |
1302291.5 ns |
1331208.5 ns |
0.98 |
array/private/iteration/logical |
1912958 ns |
1774646 ns |
1.08 |
array/private/iteration/scalar |
1853208 ns |
2913459 ns |
0.64 |
array/random/rand/Float32 |
643792 ns |
659333 ns |
0.98 |
array/random/rand/Int64 |
692375 ns |
734333 ns |
0.94 |
array/random/rand!/Float32 |
545875 ns |
601375 ns |
0.91 |
array/random/rand!/Int64 |
498167 ns |
524458 ns |
0.95 |
array/random/randn/Float32 |
599333 ns |
621083.5 ns |
0.96 |
array/random/randn!/Float32 |
501250 ns |
550583 ns |
0.91 |
array/reductions/mapreduce/Float32/1d |
338666 ns |
506042 ns |
0.67 |
array/reductions/mapreduce/Float32/dims=1 |
481916 ns |
523958 ns |
0.92 |
array/reductions/mapreduce/Float32/dims=1L |
745708 ns |
767916.5 ns |
0.97 |
array/reductions/mapreduce/Float32/dims=2 |
486167 ns |
535625 ns |
0.91 |
array/reductions/mapreduce/Float32/dims=2L |
1120333 ns |
1366000 ns |
0.82 |
array/reductions/mapreduce/Int64/1d |
615334 ns |
979500 ns |
0.63 |
array/reductions/mapreduce/Int64/dims=1 |
781708 ns |
854459 ns |
0.91 |
array/reductions/mapreduce/Int64/dims=1L |
1305208 ns |
1441520.5 ns |
0.91 |
array/reductions/mapreduce/Int64/dims=2 |
959854 ns |
975313 ns |
0.98 |
array/reductions/mapreduce/Int64/dims=2L |
2242042 ns |
2238375 ns |
1.00 |
array/reductions/reduce/Float32/1d |
336750 ns |
502209 ns |
0.67 |
array/reductions/reduce/Float32/dims=1 |
476584 ns |
524750 ns |
0.91 |
array/reductions/reduce/Float32/dims=1L |
744896 ns |
782270.5 ns |
0.95 |
array/reductions/reduce/Float32/dims=2 |
482666 ns |
477083 ns |
1.01 |
array/reductions/reduce/Float32/dims=2L |
1121833 ns |
1357459 ns |
0.83 |
array/reductions/reduce/Int64/1d |
626687.5 ns |
965959 ns |
0.65 |
array/reductions/reduce/Int64/dims=1 |
793729 ns |
814250 ns |
0.97 |
array/reductions/reduce/Int64/dims=1L |
1307729.5 ns |
1609583 ns |
0.81 |
array/reductions/reduce/Int64/dims=2 |
966000 ns |
972875 ns |
0.99 |
array/reductions/reduce/Int64/dims=2L |
2225292 ns |
2215667 ns |
1.00 |
array/shared/copy |
188333 ns |
244354 ns |
0.77 |
array/shared/copyto!/cpu_to_gpu |
40500 ns |
39917 ns |
1.01 |
array/shared/copyto!/gpu_to_cpu |
40125 ns |
40500 ns |
0.99 |
array/shared/copyto!/gpu_to_gpu |
40958 ns |
40834 ns |
1.00 |
array/shared/iteration/findall/bool |
1173625 ns |
1102958 ns |
1.06 |
array/shared/iteration/findall/int |
1322708 ns |
1300875 ns |
1.02 |
array/shared/iteration/findfirst/bool |
1122375 ns |
1196875 ns |
0.94 |
array/shared/iteration/findfirst/int |
1206187.5 ns |
1225417 ns |
0.98 |
array/shared/iteration/findmin/1d |
1342354 ns |
1347667 ns |
1.00 |
array/shared/iteration/findmin/2d |
1303041.5 ns |
1337708 ns |
0.97 |
array/shared/iteration/logical |
1770375 ns |
1627500 ns |
1.09 |
array/shared/iteration/scalar |
6158.4 ns |
5777.833333333333 ns |
1.07 |
integration/byval/reference |
1182417 ns |
1169916 ns |
1.01 |
integration/byval/slices=1 |
1184166 ns |
1170583 ns |
1.01 |
integration/byval/slices=2 |
2128666 ns |
2096042 ns |
1.02 |
integration/byval/slices=3 |
18881854 ns |
8003333 ns |
2.36 |
integration/metaldevrt |
474042 ns |
502500 ns |
0.94 |
kernel/indexing |
352625 ns |
379084 ns |
0.93 |
kernel/indexing_checked |
510875 ns |
563583 ns |
0.91 |
kernel/launch |
13458 ns |
13375 ns |
1.01 |
kernel/rand |
528500 ns |
576083.5 ns |
0.92 |
latency/import |
1521048500 ns |
1500519500 ns |
1.01 |
latency/precompile |
32325091520.5 ns |
32065662458 ns |
1.01 |
latency/ttfp |
1836443584 ns |
1812250688 ns |
1.01 |
metal/synchronization/context |
717.8137931034482 ns |
685.126582278481 ns |
1.05 |
metal/synchronization/stream |
480.015306122449 ns |
459.81218274111677 ns |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #822 +/- ##
==========================================
- Coverage 83.42% 83.36% -0.06%
==========================================
Files 67 67
Lines 3669 3668 -1
==========================================
- Hits 3061 3058 -3
- Misses 608 610 +2 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
db53574 to
fdf7a32
Compare
|
Could the github actions failure be related to the compilation failures? |
|
I'm not sure how. There's a known issue with ObjC errors leaking outside of their retain/release scope and crashing during error reporting, but that's read-only and shouldn't cause a crash in LLVM. |
|
I guess the remaining question is semantics. Do we want to allow scalar iteration on shard memory so that it can be used with CPU code? It'll never be as fast as Array, but if we port the |
|
i.e. |
|
Right, that's the current design of CUDA.jl: Precompiling CUDA finished.
12 dependencies successfully precompiled in 43 seconds. 90 already precompiled.
julia> CUDA.allowscalar(false)
julia> a = cu([1])
1-element CuArray{Int64, 1, CUDACore.DeviceMemory}:
1
julia> a[]
ERROR: Scalar indexing is disallowed.
julia> b = cu([1]; unified=true)
1-element CuArray{Int64, 1, CUDACore.UnifiedMemory}:
1
julia> b[]
1I'm not entirely convinced this is the best option though. It makes sense, but people often use |
|
This is also how shared storage currently works |
|
But without the |
Right, but it never was the default. It would break users doing And we definitely need the |
|
What of we add a |
|
I guess that could work, but the interface is not extensible like that right now. EDIT: I guess we could implement |
No description provided.