feat: add claude code skill to support generate ninetoothed ops#143
Open
feat: add claude code skill to support generate ninetoothed ops#143
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Skill 的核心是一个双验证迭代循环:
生成代码 → 并行调度精度验证 + 性能验证 → 分析结果 → 不达标则修改代码 → 重复(最多5次)
精度验证:按数据类型和算子类型设定容差(如 fp32 rtol=1e-5, fp16 rtol=1e-3),检查 allclose、NaN、Inf。
性能验证:遍历所有候选 block 配置取最优 best_ms,与 auto-tune 选定的 tune_ms 对比,要求效率 ≥ 0.95。