Dynamic threadgroup memory by christiangnrd · Pull Request #750 · JuliaGPU/Metal.jl

christiangnrd · 2026-03-01T20:17:15Z

Surprisingly it's somewhat functional. Only macOS 15+ since the global dynamic threadgroup memory is not available before then. This isn't the only way to get this dynamic threadgroup memory, but I tried this approach as my first attempt since it seemed the most similar to how static threadgroup memory is implemented.

The Metal interface takes an Integer or a Tuple of the size of the allocation, which is then aligned to the next multiple of 16.

Kernels silently fail under shader validation (which are caught in the tests since the output doesn't match expected results.

TODO:

Validate total threadgroup memory used
Make work under shader validation

Close #701

codecov · 2026-03-01T20:32:45Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.03%. Comparing base (edeab68) to head (c6bf0b4).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #750      +/-   ##
==========================================
+ Coverage   78.55%   82.03%   +3.47%     
==========================================
  Files          67       64       -3     
  Lines        3605     3590      -15     
==========================================
+ Hits         2832     2945     +113     
+ Misses        773      645     -128

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

[only tests] [only special]

christiangnrd mentioned this pull request Mar 1, 2026

[metal] Dynamic Threadgroup memory JuliaGPU/GPUCompiler.jl#768

Draft

christiangnrd added help wanted Extra attention is needed kernels Things about kernels and how they are compiled. labels Mar 14, 2026

christiangnrd force-pushed the dynmem branch from 8f00fe0 to e038d2a Compare April 11, 2026 20:19

christiangnrd force-pushed the dynmem branch 10 times, most recently from 26d4370 to 70d9023 Compare June 7, 2026 17:58

christiangnrd force-pushed the dynmem branch 2 times, most recently from 0b9c156 to c6bf0b4 Compare June 16, 2026 22:49

christiangnrd added 9 commits June 17, 2026 14:40

set_threadgroup_memory_length!

0fbc473

[to clean up] "working" AI-assisted dynamic shared memory

98a67eb

SHMEM interface

2b37378

Cleanup

191fff9

Proof of concept mapreduce

ed4fd2f

Align dynamic threadgroup memory to 16 bytes

44f2225

Tests

175af0d

Gpucompiler

ad8e70e

[only tests] [only special]

Match Apple's alignment

f8f5d7e

christiangnrd force-pushed the dynmem branch from c6bf0b4 to f8f5d7e Compare June 17, 2026 17:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic threadgroup memory#750

Dynamic threadgroup memory#750
christiangnrd wants to merge 9 commits into
mainfrom
dynmem

christiangnrd commented Mar 1, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Mar 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christiangnrd commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

christiangnrd commented Mar 1, 2026 •

edited

Loading

codecov Bot commented Mar 1, 2026 •

edited

Loading