Reduce reliance on ObjC blocks#845
Merged
Merged
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #845 +/- ##
==========================================
+ Coverage 84.25% 84.27% +0.01%
==========================================
Files 68 68
Lines 4110 4140 +30
==========================================
+ Hits 3463 3489 +26
- Misses 647 651 +4 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Contributor
There was a problem hiding this comment.
Metal Benchmarks
Details
| Benchmark suite | Current: 898b1fe | Previous: 623b6a9 | Ratio |
|---|---|---|---|
array/accumulate/Float32/1d |
830270.5 ns |
860291 ns |
0.97 |
array/accumulate/Float32/dims=1 |
968833 ns |
962875 ns |
1.01 |
array/accumulate/Float32/dims=1L |
9986604 ns |
9867583 ns |
1.01 |
array/accumulate/Float32/dims=2 |
1239042 ns |
1271313 ns |
0.97 |
array/accumulate/Float32/dims=2L |
6967375 ns |
6945417 ns |
1.00 |
array/accumulate/Int64/1d |
925708 ns |
955625 ns |
0.97 |
array/accumulate/Int64/dims=1 |
1118750 ns |
1074479 ns |
1.04 |
array/accumulate/Int64/dims=1L |
11600812.5 ns |
11749458 ns |
0.99 |
array/accumulate/Int64/dims=2 |
1420000 ns |
1436813 ns |
0.99 |
array/accumulate/Int64/dims=2L |
9570417 ns |
9559125 ns |
1.00 |
array/broadcast |
361333 ns |
356666.5 ns |
1.01 |
array/construct |
3292 ns |
3208 ns |
1.03 |
array/permutedims/2d |
544167 ns |
615417 ns |
0.88 |
array/permutedims/3d |
1063375 ns |
1087833 ns |
0.98 |
array/permutedims/4d |
1735500 ns |
1937500 ns |
0.90 |
array/private/copy |
384854 ns |
437209 ns |
0.88 |
array/private/copyto!/cpu_to_gpu |
377792 ns |
380750 ns |
0.99 |
array/private/copyto!/gpu_to_cpu |
374667 ns |
357625 ns |
1.05 |
array/private/copyto!/gpu_to_gpu |
354667 ns |
349500 ns |
1.01 |
array/private/iteration/findall/bool |
1056583 ns |
1054667 ns |
1.00 |
array/private/iteration/findall/int |
1178833 ns |
1189125 ns |
0.99 |
array/private/iteration/findfirst/bool |
1418312 ns |
1319958 ns |
1.07 |
array/private/iteration/findfirst/int |
1426375 ns |
1501292 ns |
0.95 |
array/private/iteration/findmin/1d |
1545541 ns |
1545624.5 ns |
1.00 |
array/private/iteration/findmin/2d |
1306125 ns |
1302458 ns |
1.00 |
array/private/iteration/logical |
1575834 ns |
1610583.5 ns |
0.98 |
array/private/iteration/scalar |
2495458 ns |
2355875 ns |
1.06 |
array/random/rand/Float32 |
562729 ns |
555458 ns |
1.01 |
array/random/rand/Int64 |
640500 ns |
672541.5 ns |
0.95 |
array/random/rand!/Float32 |
522250 ns |
517208 ns |
1.01 |
array/random/rand!/Int64 |
491250 ns |
483750 ns |
1.02 |
array/random/randn/Float32 |
483083 ns |
567437.5 ns |
0.85 |
array/random/randn!/Float32 |
476166 ns |
473583 ns |
1.01 |
array/reductions/mapreduce/Float32/1d |
718708 ns |
716833 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1 |
473959 ns |
473333.5 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1L |
754854 ns |
742916 ns |
1.02 |
array/reductions/mapreduce/Float32/dims=2 |
485916 ns |
485000 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
1313417 ns |
1314167 ns |
1.00 |
array/reductions/mapreduce/Int64/1d |
968875 ns |
924792 ns |
1.05 |
array/reductions/mapreduce/Int64/dims=1 |
779729.5 ns |
777479.5 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=1L |
1310792 ns |
1359917 ns |
0.96 |
array/reductions/mapreduce/Int64/dims=2 |
964770.5 ns |
956292 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=2L |
2194250 ns |
2194583 ns |
1.00 |
array/reductions/reduce/Float32/1d |
711250 ns |
721042 ns |
0.99 |
array/reductions/reduce/Float32/dims=1 |
473542 ns |
471500 ns |
1.00 |
array/reductions/reduce/Float32/dims=1L |
725208 ns |
729625 ns |
0.99 |
array/reductions/reduce/Float32/dims=2 |
483270.5 ns |
482458 ns |
1.00 |
array/reductions/reduce/Float32/dims=2L |
1292417 ns |
1300937.5 ns |
0.99 |
array/reductions/reduce/Int64/1d |
911958 ns |
943395.5 ns |
0.97 |
array/reductions/reduce/Int64/dims=1 |
765750 ns |
802312.5 ns |
0.95 |
array/reductions/reduce/Int64/dims=1L |
1294333 ns |
1361771 ns |
0.95 |
array/reductions/reduce/Int64/dims=2 |
925791 ns |
963959 ns |
0.96 |
array/reductions/reduce/Int64/dims=2L |
2164834 ns |
2175250 ns |
1.00 |
array/shared/copy |
234396 ns |
236417 ns |
0.99 |
array/shared/copyto!/cpu_to_gpu |
40750 ns |
39750 ns |
1.03 |
array/shared/copyto!/gpu_to_cpu |
38625 ns |
41042 ns |
0.94 |
array/shared/copyto!/gpu_to_gpu |
39625 ns |
41541 ns |
0.95 |
array/shared/iteration/findall/bool |
1044792 ns |
1064750 ns |
0.98 |
array/shared/iteration/findall/int |
1181584 ns |
1190167 ns |
0.99 |
array/shared/iteration/findfirst/bool |
1136000 ns |
1131792 ns |
1.00 |
array/shared/iteration/findfirst/int |
1236750 ns |
1225396 ns |
1.01 |
array/shared/iteration/findmin/1d |
1316042 ns |
1323833 ns |
0.99 |
array/shared/iteration/findmin/2d |
1314333 ns |
1316916.5 ns |
1.00 |
array/shared/iteration/logical |
1401291.5 ns |
1459145.5 ns |
0.96 |
array/shared/iteration/scalar |
4904.714285714285 ns |
4738.142857142857 ns |
1.04 |
integration/byval/reference |
1132416 ns |
1138291 ns |
0.99 |
integration/byval/slices=1 |
1134833 ns |
1137584 ns |
1.00 |
integration/byval/slices=2 |
2045250 ns |
2055292 ns |
1.00 |
integration/byval/slices=3 |
7496291 ns |
8384542 ns |
0.89 |
integration/metaldevrt |
436875 ns |
441542 ns |
0.99 |
kernel/indexing |
351895.5 ns |
326042 ns |
1.08 |
kernel/indexing_checked |
529458.5 ns |
521625 ns |
1.02 |
kernel/launch |
4458 ns |
4833 ns |
0.92 |
kernel/rand |
526458 ns |
515750 ns |
1.02 |
latency/import |
1646796458 ns |
1649667041 ns |
1.00 |
latency/precompile |
36022232729 ns |
35841764917 ns |
1.01 |
latency/ttfp |
1975584333 ns |
1977516916 ns |
1.00 |
metal/synchronization/context |
620.9943502824859 ns |
606.8121546961326 ns |
1.02 |
metal/synchronization/stream |
416.04 ns |
402.7761194029851 ns |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ObjC blocks are executed on Metal's stack, which may have a more limited stack size that could explain the crashes seen in #822 (comment) when compilation is triggered on that stack. So instead, rely on a deferred mechanism that triggers when synchronizing. This should ideally also improve launch overhead because we don't need to allocate a block anymore.