Skip to content

Conversation

@chizukicn
Copy link

close #9120

  • Fix TypeError when calling len() on int values during recursive merging
  • Improve merge logic: when dict and int types conflict, use new value (right value) as override
  • Refactor code for better readability with clearer variable names and comments
  • Add comprehensive tests for mixed type merging scenarios (dict/None, dict/int)
  • Fix recursive merge parameter order bug

- Fix TypeError when calling len() on int values during recursive merging
- Improve merge logic: when dict and int types conflict, use new value (right value) as override
- Refactor code for better readability with clearer variable names and comments
- Add comprehensive tests for mixed type merging scenarios (dict/None, dict/int)
- Fix recursive merge parameter order bug
Copy link
Collaborator

@chenmoneygithub chenmoneygithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! But same as @TomeHirata commented in the issue, I could not reproduce the issue.

Dropped a comment on the code change, but basically litellm shouldn't produce different format for the same key in the usage dict for the same model, which is why we made the strong assumption of format match in the code.


# Case 2: One is dict, the other is not (int/None)
# Use new value if it's not None (right value takes precedence), otherwise keep old value
elif old_is_dict != new_is_dict:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this even happen? it's possible that one is dict and the other is None, but I doubt if they can be of different types. If that's the case, that implies an error on the litellm side, which should provide a unified format for the same model.

Copy link
Author

@chizukicn chizukicn Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chenmoneygithub Regarding this issue, I was also able to reproduce it in a GitHub Actions environment.
Just as a note, I’m using a DeepSeek model served through a third-party OpenAI-compatible API:
https://github.com/chizukicn/dspy-mini-reproduction/actions/runs/20153004759/job/57849614753?pr=3

@chizukicn chizukicn marked this pull request as draft December 12, 2025 00:53
@chizukicn chizukicn marked this pull request as ready for review December 12, 2025 01:27
@chizukicn
Copy link
Author

chizukicn commented Dec 12, 2025

@chenmoneygithub
In my previous reproduction tests, when the program runs normally, the value of self.usage_data looks like this:

    {
        'openai/deepseek/deepseek-v3.2': [
            {
                'completion_tokens': 108,
                'prompt_tokens': 741,
                'total_tokens': 849,
                'completion_tokens_details': None,
                'prompt_tokens_details': None
            },
            {
                'completion_tokens': 158,
                'prompt_tokens': 454,
                'total_tokens': 612,
                'completion_tokens_details': {
                    'accepted_prediction_tokens': None,
                    'audio_tokens': None,
                    'reasoning_tokens': 0,
                    'rejected_prediction_tokens': None,
                    'text_tokens': None,
                    'image_tokens': None
                },
                'prompt_tokens_details': {
                    'audio_tokens': 0,
                    'cached_tokens': 128,
                    'text_tokens': None,
                    'image_tokens': None,
                    'cache_creation_input_tokens': 0,
                    'cache_read_input_tokens': 0
                }
            }
        ]
    }

After adding the following logic to simulate a first-time tool call failure:

first_time = True

def search_weather(city: str):
    global first_time
    if first_time:
        raise RuntimeError("First time")
    first_time = False
    
    return {
        "city": city,
        "weather": random.choice(["sunny", "cloudy", "rainy", "snowy"]),
        "temperature": random.randint(0, 40),
        "humidity": random.randint(0, 100),
        "pressure": random.randint(900, 1100),
        "wind_speed": random.randint(0, 100),
        "wind_direction": random.choice(["N", "S", "E", "W"]),
        "wind_gust": random.randint(0, 100),
        "wind_gust_direction": random.choice(["N", "S", "E", "W"]),
        "wind_gust_speed": random.randint(0, 100),
    }

The output of self.usage_data becomes:

{
  "openai/deepseek/deepseek-v3.2": [
    {
      "completion_tokens": 129,
      "prompt_tokens": 975,
      "total_tokens": 1104,
      "completion_tokens_details": null,
      "prompt_tokens_details": null
    },
    {
      "completion_tokens": 144,
      "prompt_tokens": 1418,
      "total_tokens": 1562,
      "completion_tokens_details": null,
      "prompt_tokens_details": {
        "audio_tokens": 0,
        "cached_tokens": 0,
        "text_tokens": null,
        "image_tokens": null,
        "cache_creation_input_tokens": 0,
        "cache_read_input_tokens": 0
      }
    },
    {
      "completion_tokens": 121,
      "prompt_tokens": 1167,
      "total_tokens": 1288,
      "completion_tokens_details": {
        "accepted_prediction_tokens": null,
        "audio_tokens": null,
        "reasoning_tokens": 0,
        "rejected_prediction_tokens": null,
        "text_tokens": null,
        "image_tokens": null
      },
      "prompt_tokens_details": null
    }
  ]
}

Notice that the array now contains three entries, and the first two entries have completion_tokens_details set to None.

Looking at the _merge_usage_entries method:

def _merge_usage_entries(self, usage_entry1: dict[str, Any] | None, usage_entry2: dict[str, Any] | None) -> dict[str, Any]:
    if usage_entry1 is None or len(usage_entry1) == 0:
        return dict(usage_entry2)
    if usage_entry2 is None or len(usage_entry2) == 0:
        return dict(usage_entry1)

    result = dict(usage_entry2)
    for k, v in usage_entry1.items():
        current_v = result.get(k)
        if isinstance(v, dict) or isinstance(current_v, dict):
            result[k] = self._merge_usage_entries(current_v, v)
        else:
            result[k] = (current_v or 0) + (v or 0)
    return result

According to this logic, the first two completion_tokens_details entries, being None, are treated as 0 during the merge. Then the third entry’s completion_tokens_details gets recursively merged with 0 (corresponding to usage_entry2). This explains why the merged result is not as expected, and why None values are causing unintended behavior during recursive merging.

The new reproduction code can be found here: https://github.com/chizukicn/dspy-mini-reproduction/blob/main/main.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] TypeError in usage_tracker.py: object of type 'int' has no len() when merging usage entries

2 participants