Skip to content

Ascend Branch VLM Bug#962

Open
HwVanICI wants to merge 3 commits intoinclusionAI:ascendfrom
HwVanICI:vlm_bug_fix
Open

Ascend Branch VLM Bug#962
HwVanICI wants to merge 3 commits intoinclusionAI:ascendfrom
HwVanICI:vlm_bug_fix

Conversation

@HwVanICI
Copy link
Copy Markdown
Collaborator

@HwVanICI HwVanICI commented Mar 2, 2026

Description

The current ascend branch raises error when training VLMs. The error comes from the import str reward in areal/workflow/vision_rlvr.py. This PR fixes this issue.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not
    work as expected)
  • Documentation update
  • Code refactoring (no functional changes)
  • Performance improvement
  • Test coverage improvement

Checklist

  • I have read the Contributing Guide
  • I have run formatting tools (pre-commit or manual)
  • I have run relevant unit tests and they pass
  • I have added tests for new functionality
  • I have updated documentation if needed
  • My branch is up to date with main
  • This PR introduces breaking changes (if yes, fill out details below)
  • If this PR changes documentation, I have built and previewed it locally with
    jb build docs
  • No critical issues raised by AI reviewers (/gemini review)

Breaking Change Details (if applicable):

Additional Context


Need help? Check the Contributing Guide or ask in
GitHub Discussions!

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug in the ascend branch that prevented Vision-Language Models from training successfully. The fix ensures that reward functions, when specified as strings, are correctly imported and wrapped, allowing the training process to proceed without errors. This improves the robustness of the VLM training workflow.

Highlights

  • Bug Fix: Resolved an error that occurred during VLM (Vision-Language Model) training due to an incorrect import of the reward function.
  • Dynamic Reward Function Loading: Implemented logic to dynamically import the reward function if it is provided as a string, ensuring proper initialization and preventing runtime errors.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • areal/workflow/vision_rlvr.py
    • Imported AsyncRewardWrapper for asynchronous reward handling.
    • Imported import_from_string utility for dynamic module loading.
    • Added conditional logic within arun_episode to dynamically load self.reward_fn if it's a string, then wrap it with AsyncRewardWrapper.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@HwVanICI HwVanICI marked this pull request as ready for review March 2, 2026 22:10
@HwVanICI HwVanICI changed the title vlm bug fix Ascend Branch VLM Bug Mar 2, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a bug where using a string-based reward function with VisionRLVRWorkflow would cause a crash. The fix correctly adds logic to dynamically import and wrap the reward function. However, this introduces code duplication from the parent RLVRWorkflow class. I've added a suggestion to refactor this duplicated logic to improve maintainability.

Comment on lines +106 to +108
if isinstance(self.reward_fn, str):
self.reward_fn = import_from_string(self.reward_fn)
self.async_reward_fn = AsyncRewardWrapper(self.reward_fn)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This logic for lazily initializing the reward function is also present in the parent class RLVRWorkflow.arun_episode. This duplication could lead to maintenance issues if the logic needs to be updated in the future.

To improve maintainability and avoid duplication, consider refactoring this block into a protected helper method in the RLVRWorkflow base class and calling it from both arun_episode methods.

For example, you could add the following to areal/workflow/rlvr.py:

class RLVRWorkflow(RolloutWorkflow):
    # ...

    def _initialize_reward_fn(self):
        """Initializes reward_fn from string if necessary."""
        if isinstance(self.reward_fn, str):
            self.reward_fn = import_from_string(self.reward_fn)
            self.async_reward_fn = AsyncRewardWrapper(self.reward_fn)

Then, you could call self._initialize_reward_fn() at the beginning of arun_episode in both RLVRWorkflow and VisionRLVRWorkflow, which would remove the duplicated code.

@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had recent activity within the last 14 days.

Please add a comment or push new commits to keep it active.

Thank you for your contribution!

@github-actions github-actions bot added the stale label Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant