Skip to content

[BUG] Ollama streaming adapter drops tool_calls emitted before the done chunk #1922

@djmcgreal-cc

Description

@djmcgreal-cc

📋 Prerequisites

  • Searched existing issues
  • Reproducible

🐛 Bug Description

The KAgentOllamaLlm streaming path in kagent-adk/src/kagent/adk/models/_ollama.py only reads tool_calls from the chunk where chunk.done == True. However, Ollama's /api/chat streaming protocol emits tool_calls in an earlier chunk and then sends a separate final chunk with done=True, tool_calls=None, content="". As a result, when an Agent has spec.declarative.stream: true (the default), every tool call the model makes is silently discarded. The agent yields an LlmResponse with empty content.parts: [], no event is enqueued, and the A2A request hangs in a dequeue_event poll loop until the client times out.

🔄 Steps to Reproduce

  1. Apply a ModelConfig pointing at any Ollama-hosted model with native tool calling (llama3.2:3b, qwen2.5:3b, etc.).
  2. Apply a declarative Agent with stream: true and at least one MCP tool (e.g. the default my-first-k8s-agent with k8s_get_resources).
  3. Send a prompt that should trigger a tool call ("any exciting events in my cluster recently?").
  4. Observe: no reply is ever returned to the UI; Phoenix shows an LlmResponse with parts: [] despite non-zero eval_count.

🔬 Direct evidence

Streaming Ollama with the same tool, hitting the upstream directly:

$ curl -s POST /api/chat -d '{"model":"llama3.2:3b","stream":true,"tools":[...],"messages":[...]}'
done=False content=''  tool_calls=[{'function': {'name': 'k8s_get_resources', 'arguments': {'resource_type': 'events'}}}]
done=True  content=''  tool_calls=None

The tool call arrives in the non-final chunk.

🩹 Code location

kagent-adk/src/kagent/adk/models/_ollama.py (streaming branch in generate_content_async):

async for chunk in response:
    if chunk.message.content:
        aggregated_text += chunk.message.content
        yield LlmResponse(..., partial=True, ...)
    if chunk.done:
        final_parts = []
        if aggregated_text:
            final_parts.append(types.Part.from_text(text=aggregated_text))
        for tc in chunk.message.tool_calls or []:   # ← only the done chunk
            ...

Should accumulate tool_calls across all chunks:

aggregated_tool_calls: list = []
async for chunk in response:
    if chunk.message.content:
        ...
    if chunk.message.tool_calls:
        aggregated_tool_calls.extend(chunk.message.tool_calls)
    if chunk.done:
        ...
        for tc in aggregated_tool_calls:
            ...

The non-streaming branch in the same function handles this correctly — it's only the streaming path that's broken.

🩺 Workaround

Set spec.declarative.stream: false on the Agent CR. The non-streaming path correctly emits function_call parts.

💻 Environment

  • Chart: kagent-0.9.4
  • App image: cr.kagent.dev/kagent-dev/kagent/app:0.9.4
  • kagent-adk: 0.3.0
  • Ollama backend: tested against llama3.2:3b and gemma3n:e4b aliases; reproducible with any tool-capable Ollama model
  • Kubernetes: kind in devcontainer

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions