Skip to content

Add M5 Max benchmark — first H17 ANE on record#50

Open
lixiangnlp wants to merge 1 commit intomaderix:mainfrom
lixiangnlp:add-m5max-benchmark
Open

Add M5 Max benchmark — first H17 ANE on record#50
lixiangnlp wants to merge 1 commit intomaderix:mainfrom
lixiangnlp:add-m5max-benchmark

Conversation

@lixiangnlp
Copy link
Copy Markdown

Summary

  • Hardware: MacBook Pro · Apple M5 Max (6P + 12E, 128 GB) · macOS 26.4.1
  • Key finding: _ANEDeviceInfo.aneSubType returns h17 on M5 Max — distinct from the h16 subtype reported by both M4 and the base M5 (per training/m5result.md). This is the first H17 ANE in community_results.json.
  • Peak compute is unchanged from M4: 19.27 TFLOPS FP16 / 35.61 TOPS INT8 W8A8 (within 4% of the README's M4 figures).
  • Training gains over base M5 are CPU-driven, not ANE-driven — Stories110M 73.5 ms/step (dynamic) and Qwen3-0.6B 320 ms/step (1.29× the README's M4 412 ms baseline) come from the 12 P-cores + Accelerate, while ane_ms is essentially flat across M4/M5/M5 Max.

Full probe report (inmem_peak, inmem_bench, sram_bench, ane_int8_bench, dynamic + static training) lives in benchmarks/m5max_result.md.

Files changed

  • benchmarks/m5max_result.md — new, ~180-line probe + training report aligned with the format used by training/m5result.md
  • benchmarks/community_results.json — adds an M5 Max training entry and updates neural_engine_specs with the H17 subtype field

Test plan

  • inmem_peak, inmem_bench, sram_bench, ane_int8_bench all build with the existing one-line xcrun clang … invocations on macOS 26.4.1
  • training/training_dynamic builds for both MODEL=stories110m and MODEL=qwen3_06b
  • training/train_large (static pipeline) builds and runs with random init when no pretrained weight file is present
  • python3 -c "import json; json.load(open('benchmarks/community_results.json'))" parses cleanly after the edit

🤖 Generated with Claude Code

_ANEDeviceInfo.aneSubType returns "h17" on M5 Max (M4 / base M5 = "h16"), but
peak FP16 (19.27 TFLOPS) and INT8 W8A8 (35.61 TOPS) match M4 within 4%.
Stories110M static 90.0 ms/step, dynamic 73.5 ms/step; Qwen3-0.6B dynamic
320.0 ms/step (1.29× M4 baseline). Training gains over base M5 are CPU-driven
(12 P-cores + Accelerate), not ANE-driven.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant