Create blog post on AKS NAP disruption management#5685
Create blog post on AKS NAP disruption management#5685wdarko1 wants to merge 7 commits intoAzure:masterfrom
Conversation
Added a blog post on managing disruption with AKS Node Auto-Provisioning, covering best practices for Pod Disruption Budgets and consolidation.
There was a problem hiding this comment.
Pull request overview
Adds a new AKS blog post focused on managing voluntary disruption when using Node Auto-Provisioning (NAP), with guidance on Pod Disruption Budgets (PDBs), consolidation controls, disruption budgets, and maintenance windows.
Changes:
- Added a new blog post covering NAP disruption concepts and common pitfalls.
- Included YAML examples for PDBs and NodePool disruption settings (consolidation policy, budgets, schedules).
- Added operational guidance on observability and drift/image update considerations.
| --- | ||
|
|
||
| <!-- truncate --> | ||
|
|
||
| :::info | ||
|
|
||
| Learn more about how to [configure disruption policies for NAP](https://learn.microsoft.com/azure/aks/node-auto-provisioning-disruption) | ||
|
|
||
| ::: |
There was a problem hiding this comment.
Per the repo’s blog post structure, add a hero image immediately after <!-- truncate -->. The post directory currently contains only index.md, so readers won’t get a hero/social image unless you add one (for example ./hero-image.png) and reference it here with descriptive alt text.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Clarified descriptions of Pod Disruption Budgets and their impact on voluntary evictions. Improved wording for clarity and corrected minor typos.
sabbour
left a comment
There was a problem hiding this comment.
Content review: solid technical depth on a high-value topic. The main blockers are em dashes throughout (banned by style guide), third-person voice in the opening, a cluster of typos in Part 4, and a duplicated troubleshooting section. Nine inline comments with specific fixes.
sabbour
left a comment
There was a problem hiding this comment.
Content review: solid technical depth on a high-value topic. The main blockers are em dashes throughout (banned by style guide), third-person voice in the opening, a cluster of typos in Part 4, and a duplicated troubleshooting section. Nine inline comments with specific fixes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Ahmed Sabbour <103856+sabbour@users.noreply.github.com>
Updated formatting and clarified sections on NAP disruption best practices, including node disruption budgets and observability.
Updated guidance on managing NAP node disruptions, including operational takeaways and common pitfalls with suggested fixes.
| - Why won’t NAP scale down, even though I have lots of underused capacity? | ||
| - Why do upgrades get “stuck” on certain nodes? | ||
|
|
||
| This post focuses on **NAP disruption best practices**, not workload scheduling (tools like topology spread constraints, node affinity, and taints). For scheduling best practices, see the NAP scheduling fundamentals post (link TBD). |
There was a problem hiding this comment.
This sentence contains a placeholder "(link TBD)". Please replace it with a real link to the referenced scheduling fundamentals post or remove the reference before publishing.
| This post focuses on **NAP disruption best practices**, not workload scheduling (tools like topology spread constraints, node affinity, and taints). For scheduling best practices, see the NAP scheduling fundamentals post (link TBD). | |
| This post focuses on **NAP disruption best practices**, not workload scheduling (tools like topology spread constraints, node affinity, and taints). |
|
|
||
| <!-- truncate --> | ||
|
|
||
|  |
There was a problem hiding this comment.
The hero image referenced here is very large (~1.7 MB). Please compress/resize it (ideally <500 KB) to reduce page weight and improve load performance.
|  | |
|  |
|
|
||
| - How do I control when scale downs happen, or where it shouldn't? | ||
| - How do I control workload disruption so it happens predictably (and not in the middle of business hours)? |
There was a problem hiding this comment.
This bullet is grammatically incomplete ("where it shouldn't?"). Consider rephrasing to include the missing verb/object (for example, "where it shouldn't happen").
Added a blog post on managing disruption with AKS Node Auto-Provisioning, covering best practices for Pod Disruption Budgets and consolidation.