Skip to content

Conversation

@kristol07
Copy link

@kristol07 kristol07 commented Nov 12, 2025

AgentScope Version

commit: 5c3a770

I am updating the latest code in main branch.

Description

fix(evaluator_storage): correct save path ordering in FileEvaluatorStorage

In docstring, the directory structure is:

The files are organized in a directory structure:
    - save_dir/
        - evaluation_result.json
        - evaluation_meta.json
        - {task_id}/
            - {repeat_id}/
                - solution.json
                - evaluation/
                    - {metric_name}.json

But the implementation doesn't follow this structure.

image

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has been formatted with pre-commit run --all-files command
  • All tests are passing
  • Docstrings are in Google style
  • Related documentation has been updated (e.g. links, examples, etc.)
  • Code is ready for review

@kristol07 kristol07 closed this Nov 12, 2025
@kristol07 kristol07 reopened this Nov 12, 2025
@kristol07
Copy link
Author

@qbc2016 @DavdGao Please take a review, this is minor change.

@DavdGao
Copy link
Member

DavdGao commented Nov 17, 2025

@kristol07 Thanks for pointing out the issue, but the it seems like it's the typo in docstrings rather than the code implementation. Considering we are developing evaluation visualization in agentscope-studio with the current directory organization, maybe just fix the wrong description in docstrings instead?

@DavdGao DavdGao added the Documentation Improvements or additions to documentation label Nov 17, 2025
@kristol07
Copy link
Author

kristol07 commented Nov 17, 2025

@DavdGao I think the best approach depends on how you want to interpret or evaluate the results. In my situation, since there are multiple distinct testing scenarios and I want to assess my agent's stability in each one, I’m more interested in the outcomes of each repeated task within the same scenario. Therefore, grouping the results by task ID is preferable in my case, that's why I thought it's code error. On the other hand, if all the testing scenarios are of the same type, it makes more sense to group by repeat ID and review the overall results across all test scenarios, that may be the case of agentscope-studio.

Grouped by task (test case):
image

Grouped by repeatId:
image

@kristol07
Copy link
Author

kristol07 commented Nov 20, 2025

@DavdGao I think the best approach depends on how you want to interpret or evaluate the results. In my situation, since there are multiple distinct testing scenarios and I want to assess my agent's stability in each one, I’m more interested in the outcomes of each repeated task within the same scenario. Therefore, grouping the results by task ID is preferable in my case, that's why I thought it's code error. On the other hand, if all the testing scenarios are of the same type, it makes more sense to group by repeat ID and review the overall results across all test scenarios, that may be the case of agentscope-studio.

Grouped by task (test case): image

Grouped by repeatId: image

Hi @DavdGao Do you have any suggestion on the flexibility to be provided to developers? For your comment, pr is updated already.

@kristol07 kristol07 changed the title fix(evaluator_storage): correct save path ordering in FileEvaluatorSt… fix(evaluator_storage): correct docstring on directory organization Nov 20, 2025
@cla-assistant
Copy link

cla-assistant bot commented Dec 2, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Member

@DavdGao DavdGao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, and thanks for your contribution to the agentscope library

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants