Skip to content

Fix flaky QoS output queue counter validation in DP-1.4#5182

Open
aks03dev wants to merge 2 commits intoopenconfig:mainfrom
b4firex:fix/DP-1.4-qos-counter-validation
Open

Fix flaky QoS output queue counter validation in DP-1.4#5182
aks03dev wants to merge 2 commits intoopenconfig:mainfrom
b4firex:fix/DP-1.4-qos-counter-validation

Conversation

@aks03dev
Copy link

  1. Brief description and need for this PR
  • The QoS output queue counters test (TestQoSCounters) was failing intermittently because the counter validation logic was too strict. It required every consecutive pair of 30-second gNMI SAMPLE updates to show a different value. When two consecutive samples were equal (e.g., both zero or both the same value), the test failed even though the DUT was forwarding traffic correctly.

  • This could have happened because with a 30-second sample interval, the first sample can arrive before the pipeline has populated the counter, and the next sample can still reflect the same value.

  • The test’s goal is to verify that QoS output queue counters are live and incrementing during active traffic over a 300-second window, not that every consecutive sample pair differs. The previous logic caused false failures when the pipeline had not yet updated between two samples.

  1. Proposed changes
  • Replace the requirement that every consecutive 30-second sample pair must differ with a check that at least one increment is observed across all samples in the 300-second window. This keeps the intent (counters are live and incrementing) while tolerating the initial lull

  • Increase the sleep before starting the gNMI SAMPLE subscription from the arbitrary 5 seconds to a standard 30 seconds. This gives the counter pipeline more time to complete its first cycle after traffic starts, reducing the chance of early zero sample

Replace strict consecutive-sample comparison with at-least-one-increment
check across the 300s sampling window. The previous logic required every
30s sample pair to differ, causing false failures when the counter
pipeline had not yet updated between two consecutive samples.

Changes:
- Increase pre-subscription sleep from 5s to 30s to allow counter
  pipeline initialization
- Check for at least one increment across all samples instead of
  requiring every consecutive pair to differ
- Fix incorrect error message referencing 'flow-af3' instead of 'flow-af2'

Made-with: Cursor
@aks03dev aks03dev requested a review from a team as a code owner February 27, 2026 10:00
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses the intermittent failures in the QoS output queue counters test by refining its validation methodology. The changes ensure that the test accurately reflects whether counters are live and incrementing, rather than failing due to transient conditions during initial counter population. This improves test reliability without compromising the core validation objective.

Highlights

  • Relaxed QoS Counter Validation: Modified the TestQoSCounters logic to no longer require every consecutive 30-second gNMI sample to differ. Instead, the test now verifies that at least one increment is observed across all samples within the 300-second window, preventing false negatives caused by initial counter population delays.
  • Increased Initial Delay: Extended the pre-subscription sleep duration from 5 seconds to 30 seconds, allowing more time for the DUT's counter pipeline to initialize and populate values before gNMI sampling begins.
Changelog
  • feature/qos/otg_tests/qos_output_queue_counters_test/qos_output_queue_counters_test.go
    • Increased the initial sleep duration before starting gNMI sample collection from 5 seconds to 30 seconds.
    • Updated the validateoutputQosPerSecoundCounter function to check for at least one counter increment over the entire 300-second sampling period.
    • Removed the previous strict validation that required every consecutive 30-second sample to show a different value.
    • Removed the tolerance variable as it is no longer used.
    • Updated a log message to reflect the new validation logic.
Activity
  • No human activity has occurred on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@OpenConfigBot
Copy link

OpenConfigBot commented Feb 27, 2026

Pull Request Functional Test Report for #5182 / cbb3d0f

Virtual Devices

Device Test Test Documentation Job Raw Log
Arista cEOS status
DP-1.4: QoS Interface Output Queue Counters
Cisco 8000E status
DP-1.4: QoS Interface Output Queue Counters
Cisco XRd status
DP-1.4: QoS Interface Output Queue Counters
Juniper ncPTX status
DP-1.4: QoS Interface Output Queue Counters
Nokia SR Linux status
DP-1.4: QoS Interface Output Queue Counters
Openconfig Lemming status
DP-1.4: QoS Interface Output Queue Counters

Hardware Devices

Device Test Test Documentation Raw Log
Arista 7808 status
DP-1.4: QoS Interface Output Queue Counters
Cisco 8808 status
DP-1.4: QoS Interface Output Queue Counters
Juniper PTX10008 status
DP-1.4: QoS Interface Output Queue Counters
Nokia 7250 IXR-10e status
DP-1.4: QoS Interface Output Queue Counters

Help

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses a flaky test by relaxing the validation logic for QoS counters and increasing the initial wait time, which is a sensible approach to improve test stability. The changes are logical and well-justified. I have added a minor suggestion to improve code style.

- Rename function from validateoutputQosPerSecoundCounter to
  validateOutputQosPerSecondCounter (fix 'Secound' typo and apply
  proper camelCase convention for unexported function)
- Update comment to match function name

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants