Fix flaky QoS output queue counter validation in DP-1.4#5182
Fix flaky QoS output queue counter validation in DP-1.4#5182aks03dev wants to merge 2 commits intoopenconfig:mainfrom
Conversation
Replace strict consecutive-sample comparison with at-least-one-increment check across the 300s sampling window. The previous logic required every 30s sample pair to differ, causing false failures when the counter pipeline had not yet updated between two consecutive samples. Changes: - Increase pre-subscription sleep from 5s to 30s to allow counter pipeline initialization - Check for at least one increment across all samples instead of requiring every consecutive pair to differ - Fix incorrect error message referencing 'flow-af3' instead of 'flow-af2' Made-with: Cursor
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses the intermittent failures in the QoS output queue counters test by refining its validation methodology. The changes ensure that the test accurately reflects whether counters are live and incrementing, rather than failing due to transient conditions during initial counter population. This improves test reliability without compromising the core validation objective. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Pull Request Functional Test Report for #5182 / cbb3d0fVirtual Devices
Hardware Devices
|
There was a problem hiding this comment.
Code Review
This pull request effectively addresses a flaky test by relaxing the validation logic for QoS counters and increasing the initial wait time, which is a sensible approach to improve test stability. The changes are logical and well-justified. I have added a minor suggestion to improve code style.
feature/qos/otg_tests/qos_output_queue_counters_test/qos_output_queue_counters_test.go
Outdated
Show resolved
Hide resolved
- Rename function from validateoutputQosPerSecoundCounter to validateOutputQosPerSecondCounter (fix 'Secound' typo and apply proper camelCase convention for unexported function) - Update comment to match function name Made-with: Cursor
The QoS output queue counters test (TestQoSCounters) was failing intermittently because the counter validation logic was too strict. It required every consecutive pair of 30-second gNMI SAMPLE updates to show a different value. When two consecutive samples were equal (e.g., both zero or both the same value), the test failed even though the DUT was forwarding traffic correctly.
This could have happened because with a 30-second sample interval, the first sample can arrive before the pipeline has populated the counter, and the next sample can still reflect the same value.
The test’s goal is to verify that QoS output queue counters are live and incrementing during active traffic over a 300-second window, not that every consecutive sample pair differs. The previous logic caused false failures when the pipeline had not yet updated between two samples.
Replace the requirement that every consecutive 30-second sample pair must differ with a check that at least one increment is observed across all samples in the 300-second window. This keeps the intent (counters are live and incrementing) while tolerating the initial lull
Increase the sleep before starting the gNMI SAMPLE subscription from the arbitrary 5 seconds to a standard 30 seconds. This gives the counter pipeline more time to complete its first cycle after traffic starts, reducing the chance of early zero sample