fix: handle mixed type merge in usage entries #9121

chizukicn · 2025-12-10T02:29:25Z

Fix TypeError when calling len() on int values during recursive merging
Improve merge logic: when dict and int types conflict, use new value (right value) as override
Refactor code for better readability with clearer variable names and comments
Add comprehensive tests for mixed type merging scenarios (dict/None, dict/int)
Fix recursive merge parameter order bug

- Fix TypeError when calling len() on int values during recursive merging - Improve merge logic: when dict and int types conflict, use new value (right value) as override - Refactor code for better readability with clearer variable names and comments - Add comprehensive tests for mixed type merging scenarios (dict/None, dict/int) - Fix recursive merge parameter order bug

chenmoneygithub

Thanks for the PR! But same as @TomeHirata commented in the issue, I could not reproduce the issue.

Dropped a comment on the code change, but basically litellm shouldn't produce different format for the same key in the usage dict for the same model, which is why we made the strong assumption of format match in the code.

chenmoneygithub · 2025-12-11T20:50:09Z

dspy/utils/usage_tracker.py

+
+            # Case 2: One is dict, the other is not (int/None)
+            # Use new value if it's not None (right value takes precedence), otherwise keep old value
+            elif old_is_dict != new_is_dict:


would this even happen? it's possible that one is dict and the other is None, but I doubt if they can be of different types. If that's the case, that implies an error on the litellm side, which should provide a unified format for the same model.

@chenmoneygithub Regarding this issue, I was also able to reproduce it in a GitHub Actions environment.
Just as a note, I’m using a DeepSeek model served through a third-party OpenAI-compatible API:
https://github.com/chizukicn/dspy-mini-reproduction/actions/runs/20153004759/job/57849614753?pr=3

chizukicn · 2025-12-12T02:15:27Z

@chenmoneygithub
In my previous reproduction tests, when the program runs normally, the value of self.usage_data looks like this:

    {
        'openai/deepseek/deepseek-v3.2': [
            {
                'completion_tokens': 108,
                'prompt_tokens': 741,
                'total_tokens': 849,
                'completion_tokens_details': None,
                'prompt_tokens_details': None
            },
            {
                'completion_tokens': 158,
                'prompt_tokens': 454,
                'total_tokens': 612,
                'completion_tokens_details': {
                    'accepted_prediction_tokens': None,
                    'audio_tokens': None,
                    'reasoning_tokens': 0,
                    'rejected_prediction_tokens': None,
                    'text_tokens': None,
                    'image_tokens': None
                },
                'prompt_tokens_details': {
                    'audio_tokens': 0,
                    'cached_tokens': 128,
                    'text_tokens': None,
                    'image_tokens': None,
                    'cache_creation_input_tokens': 0,
                    'cache_read_input_tokens': 0
                }
            }
        ]
    }

After adding the following logic to simulate a first-time tool call failure:

first_time = True

def search_weather(city: str):
    global first_time
    if first_time:
        raise RuntimeError("First time")
    first_time = False
    
    return {
        "city": city,
        "weather": random.choice(["sunny", "cloudy", "rainy", "snowy"]),
        "temperature": random.randint(0, 40),
        "humidity": random.randint(0, 100),
        "pressure": random.randint(900, 1100),
        "wind_speed": random.randint(0, 100),
        "wind_direction": random.choice(["N", "S", "E", "W"]),
        "wind_gust": random.randint(0, 100),
        "wind_gust_direction": random.choice(["N", "S", "E", "W"]),
        "wind_gust_speed": random.randint(0, 100),
    }

The output of self.usage_data becomes:

{
  "openai/deepseek/deepseek-v3.2": [
    {
      "completion_tokens": 129,
      "prompt_tokens": 975,
      "total_tokens": 1104,
      "completion_tokens_details": null,
      "prompt_tokens_details": null
    },
    {
      "completion_tokens": 144,
      "prompt_tokens": 1418,
      "total_tokens": 1562,
      "completion_tokens_details": null,
      "prompt_tokens_details": {
        "audio_tokens": 0,
        "cached_tokens": 0,
        "text_tokens": null,
        "image_tokens": null,
        "cache_creation_input_tokens": 0,
        "cache_read_input_tokens": 0
      }
    },
    {
      "completion_tokens": 121,
      "prompt_tokens": 1167,
      "total_tokens": 1288,
      "completion_tokens_details": {
        "accepted_prediction_tokens": null,
        "audio_tokens": null,
        "reasoning_tokens": 0,
        "rejected_prediction_tokens": null,
        "text_tokens": null,
        "image_tokens": null
      },
      "prompt_tokens_details": null
    }
  ]
}

Notice that the array now contains three entries, and the first two entries have completion_tokens_details set to None.

Looking at the _merge_usage_entries method:

def _merge_usage_entries(self, usage_entry1: dict[str, Any] | None, usage_entry2: dict[str, Any] | None) -> dict[str, Any]:
    if usage_entry1 is None or len(usage_entry1) == 0:
        return dict(usage_entry2)
    if usage_entry2 is None or len(usage_entry2) == 0:
        return dict(usage_entry1)

    result = dict(usage_entry2)
    for k, v in usage_entry1.items():
        current_v = result.get(k)
        if isinstance(v, dict) or isinstance(current_v, dict):
            result[k] = self._merge_usage_entries(current_v, v)
        else:
            result[k] = (current_v or 0) + (v or 0)
    return result

According to this logic, the first two completion_tokens_details entries, being None, are treated as 0 during the merge. Then the third entry’s completion_tokens_details gets recursively merged with 0 (corresponding to usage_entry2). This explains why the merged result is not as expected, and why None values are causing unintended behavior during recursive merging.

The new reproduction code can be found here: https://github.com/chizukicn/dspy-mini-reproduction/blob/main/main.py

chenmoneygithub reviewed Dec 11, 2025

View reviewed changes

chizukicn marked this pull request as draft December 12, 2025 00:53

chizukicn marked this pull request as ready for review December 12, 2025 01:27

chizukicn added 2 commits December 16, 2025 19:58

chore: update

cf51a76

chore: update

c738f38

chizukicn requested a review from chenmoneygithub December 16, 2025 12:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: handle mixed type merge in usage entries #9121

fix: handle mixed type merge in usage entries #9121

chizukicn commented Dec 10, 2025

Uh oh!

chenmoneygithub left a comment

Uh oh!

chenmoneygithub Dec 11, 2025

Uh oh!

chizukicn Dec 12, 2025 •

edited

Loading

Uh oh!

chizukicn commented Dec 12, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: handle mixed type merge in usage entries #9121

Are you sure you want to change the base?

fix: handle mixed type merge in usage entries #9121

Conversation

chizukicn commented Dec 10, 2025

Uh oh!

chenmoneygithub left a comment

Choose a reason for hiding this comment

Uh oh!

chenmoneygithub Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

chizukicn Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chizukicn commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chizukicn Dec 12, 2025 •

edited

Loading

chizukicn commented Dec 12, 2025 •

edited

Loading