Skip to content

Preserve shell tool results in streaming#1

Open
AlexVPopov wants to merge 9 commits intomainfrom
zipchat
Open

Preserve shell tool results in streaming#1
AlexVPopov wants to merge 9 commits intomainfrom
zipchat

Conversation

@AlexVPopov
Copy link
Copy Markdown

@AlexVPopov AlexVPopov commented Mar 19, 2026

Accumulate final shell done events during Responses API streaming and rebuild the final output payload for the returned message. This preserves shell metadata such as call ids, action settings, environment, and stdout/stderr/outcome for both HTTP and WebSocket streaming.

Normalize shell result parsing around the provider-shaped fields and add coverage for streaming preservation, multi-call joining, sync parsing, and function-call regression safety. This keeps built-in shell results inspectable after streamed responses complete.


Note

Medium Risk
Touches core streaming/WebSocket accumulation and rewrites how the final raw.body is built, which could affect any downstream consumers of streamed output ordering/shape. Changes are well-covered by new specs but still moderate risk due to protocol/event handling complexity.

Overview
Preserves shell tool results after streaming completes. Streaming now accumulates response.output_item.done/shell command/output events and rebuilds the final message.raw.body['output'] so shell_call/shell_call_output items (including call_id, environment, action.commands, and stdout/stderr/outcome) aren’t lost.

Unifies shell result parsing. BuiltInTools.parse_shell_call_results now joins shell_call metadata with shell_call_output items by call_id, and adds parse_shell_call_results_from_message for extracting results directly from a final RubyLLM::Message.

Also bumps version to 0.5.3 and changes provider slug from a symbol to a string to avoid sorting crashes, with expanded test coverage for streaming shell preservation, multi-call/multi-command scenarios, and function-call streaming regression.

Written by Cursor Bugbot for commit cd34151. This will update automatically on new commits. Configure here.

Summary by CodeRabbit

  • New Features

    • Aggregate shell-call results by execution ID with combined outputs, preserved ordering, richer status/environment/action, and parseable from final messages
    • Improved streaming reconstruction producing complete final payloads and preserved response IDs
    • Local shell executor support and local-execution documentation and hooks
    • Enhanced continuation and tool-call extraction/formatting for mixed tool and function flows
  • Tests

    • Extensive specs for streaming, multi-shell/multi-command runs, local shell lifecycle, and parsing
  • Chores

    • Bumped version to 0.5.3 and normalized provider slug to a string

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 19, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds streaming SSE reconstruction (StreamRawResponse, CompletedResponseAccumulator, and public stream_response), feeds the same completion accumulator into WebSocket accumulation, enhances shell-call result parsing to merge items by call_id with parse_shell_call_results_from_message, introduces a LocalShellExecutor and chat integration for local shell tool execution, and bumps slug/version to 0.5.3.

Changes

Cohort / File(s) Summary
Shell Call Result Parsing & Tests
lib/ruby_llm/providers/openai_responses/built_in_tools.rb, spec/ruby_llm/providers/openai_responses/shell_tool_spec.rb
parse_shell_call_results now coerces inputs, computes first-seen call_id ordering, groups items by call_id, and merges shell_call metadata with concatenated shell_call_output.output (deep-copying environment/action, setting id/call_id/status). Added parse_shell_call_results_from_message(message) and updated specs to expect the merged result shape and environment structure.
Streaming Response Handling & Tests
lib/ruby_llm/providers/openai_responses/streaming.rb, lib/ruby_llm/providers/openai_responses/web_socket.rb, spec/ruby_llm/providers/openai_responses/streaming_spec.rb
Added StreamRawResponse and CompletedResponseAccumulator. Implemented stream_response(...) that wires Faraday SSE on_data, updates both StreamAccumulator and CompletedResponseAccumulator per event, reconstructs final raw response and message, assigns message.response_id, and yields streaming chunks. WebSocket accumulation now builds and passes a completed raw response into the accumulator. New tests validate reconstruction, ordering, and tool/shell output assembly.
Local Shell Executor & Chat Integration
lib/ruby_llm/providers/openai_responses/local_shell_executor.rb, lib/ruby_llm/providers/openai_responses/chat.rb, spec/ruby_llm/providers/openai_responses/local_shell_lifecycle_spec.rb, spec/ruby_llm/providers/openai_responses/chat_spec.rb
Introduced LocalShellExecutor (tool wrapper) and LocalShellToolCall; added ChatExtension to intercept/dispatch openai_responses_local_shell tool calls and use local_shell_executor from params. chat.rb input/formatting/extract_tool_calls updated to serialize tool results, include continuation input messages, and produce LocalShellToolCall for local shell calls. New lifecycle tests cover continuation behavior, streaming local execution, error cases, scoping, and environment handling.
Provider Identity, Versioning & Packaging
lib/ruby_llm/providers/openai_responses.rb, lib/rubyllm_responses_api.rb, ruby_llm-responses_api.gemspec, spec/ruby_llm/providers/openai_responses_spec.rb, CHANGELOG.md
Changed provider slug from Symbol to String (:openai_responses"openai_responses"), bumped RubyLLM::ResponsesAPI::VERSION and gemspec to 0.5.3, updated slug spec, and added CHANGELOG entry. Also added require_relative for local_shell_executor.
Docs / Examples
README.md
Replaced local execution example to use environment_type: 'local' and documented the local_shell_executor hook, its expected interface, and an end-to-end wiring example.
VCR Fixtures
spec/cassettes/function_calling_*.yml
Updated recorded request bodies to reflect use of *_output continuation-only payloads and added previous_response_id where applicable.
Misc Tests
spec/ruby_llm/providers/openai_responses/streaming_spec.rb, other spec additions
Added/extended specs for streaming shell/function reconstruction, multi-call ordering, multi-command outputs, and function-argument streaming.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant StreamResp as stream_response
    participant Connection as Connection/SSE
    participant SA as StreamAccumulator
    participant CRA as CompletedResponseAccumulator
    participant Msg as Message

    Client->>StreamResp: call stream_response(connection, payload, &block)
    StreamResp->>Connection: POST /v1/responses (stream=true)
    StreamResp->>SA: init StreamAccumulator(payload)
    StreamResp->>CRA: init CompletedResponseAccumulator()
    loop for each SSE event
        Connection-->>StreamResp: SSE event (data)
        StreamResp->>CRA: CRA.update(event_hash)
        StreamResp->>SA: SA.update(event_hash)
        StreamResp->>Client: yield built chunk (from SA)
    end
    StreamResp->>CRA: raw = CRA.build_response(nil)
    StreamResp->>Msg: message = SA.to_message(raw)
    StreamResp->>Msg: assign message.response_id from raw['id']
    StreamResp-->>Client: return final message (with reconstructed raw.body)
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.41% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Preserve shell tool results in streaming' directly describes the main objective stated in the PR: accumulating and rebuilding final shell tool results during Responses API streaming (HTTP SSE and WebSocket) so shell outputs remain inspectable after completion.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch zipchat

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@AlexVPopov
Copy link
Copy Markdown
Author

@codex review

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/ruby_llm/providers/openai_responses/streaming.rb`:
- Around line 288-298: The methods build_on_data_handler and faraday_1? are
missing and must be implemented to avoid NoMethodError when
apply_stream_on_data_handler calls them; add a private helper
build_on_data_handler that accepts a block and returns a Proc or lambda
compatible with Faraday's on_data callback which forwards received chunked data
to handle_stream_event (preserving the yielded parameters expected by
req.options[:on_data]/req.options.on_data), and add a faraday_1? helper that
reliably detects Faraday v1 (e.g., by checking Faraday.const_defined?(:VERSION)
or Gem.loaded_specs['faraday'].version.to_s.start_with?('1') with a safe
respond_to? fallback) so the conditional choosing req.options[:on_data] vs
req.options.on_data works correctly; keep both helpers in the same class/module
as apply_stream_on_data_handler and mark them private.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 25e1b4b5-e0ac-40a2-b6ef-76ea2167e331

📥 Commits

Reviewing files that changed from the base of the PR and between 042b35d and 770c987.

📒 Files selected for processing (5)
  • lib/ruby_llm/providers/openai_responses/built_in_tools.rb
  • lib/ruby_llm/providers/openai_responses/streaming.rb
  • lib/ruby_llm/providers/openai_responses/web_socket.rb
  • spec/ruby_llm/providers/openai_responses/shell_tool_spec.rb
  • spec/ruby_llm/providers/openai_responses/streaming_spec.rb

Comment thread lib/ruby_llm/providers/openai_responses/streaming.rb
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 770c987f5d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread lib/ruby_llm/providers/openai_responses/streaming.rb Outdated
noelblaschke and others added 5 commits March 26, 2026 16:54
[FIX] slug method should return string instead of symbol to allow Mod…
Return string from slug instead of symbol to match the base Provider
contract, fixing ArgumentError when RubyLLM sorts models by provider.
Accumulate final shell done events during Responses API streaming and rebuild the final output payload for the returned message. This preserves shell metadata such as call ids, action settings, environment, and stdout/stderr/outcome for both HTTP and WebSocket streaming.

Normalize shell result parsing around the provider-shaped fields and add coverage for streaming preservation, multi-call joining, sync parsing, and function-call regression safety. This keeps built-in shell results inspectable after streamed responses complete.
Accumulate final shell output content by command index instead of overwriting prior entries for the same shell call output item. This preserves stdout, stderr, and outcomes for multi-command shell executions in the final streamed message.

Add a regression spec covering a streamed shell call with multiple commands so earlier command results are retained in the rebuilt output payload and parser results.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/ruby_llm/providers/openai_responses/streaming.rb`:
- Around line 150-158: flattened_shell_output currently treats
`@shell_outputs_by_output_index` as a fallback and can drop entries that exist
only in one bucket; change flattened_shell_output to merge the two buckets
instead: retrieve the hash for item_id from `@shell_outputs_by_item_id` and the
hash for output_index from `@shell_outputs_by_output_index`, combine them by
unioning their keys and concatenating the arrays for any matching command_index
so no outputs are lost, then sort the merged hash by command_index and flat_map
the values to produce the final flattened array (use the existing method name
flattened_shell_output and variables `@shell_outputs_by_item_id` /
`@shell_outputs_by_output_index` to locate the code).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 74e5196a-e0e2-4afa-a4b6-6bdf05f35a2d

📥 Commits

Reviewing files that changed from the base of the PR and between 6b7e6dc and cd34151.

📒 Files selected for processing (5)
  • lib/ruby_llm/providers/openai_responses/built_in_tools.rb
  • lib/ruby_llm/providers/openai_responses/streaming.rb
  • lib/ruby_llm/providers/openai_responses/web_socket.rb
  • spec/ruby_llm/providers/openai_responses/shell_tool_spec.rb
  • spec/ruby_llm/providers/openai_responses/streaming_spec.rb
✅ Files skipped from review due to trivial changes (1)
  • spec/ruby_llm/providers/openai_responses/streaming_spec.rb

Comment on lines +150 to +158
def flattened_shell_output(item_id, output_index)
output_by_command_index = @shell_outputs_by_item_id[item_id]
output_by_command_index = @shell_outputs_by_output_index[output_index] if output_by_command_index.empty?
return if output_by_command_index.empty?

output_by_command_index
.sort_by { |command_index, _| command_index || -1 }
.flat_map(&:last)
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Merge the item_id and output_index buckets before flattening shell output.

Line 152 currently treats output_index as a pure fallback. If some response.shell_call_output_content.done events are keyed only by item_id and others only by output_index, one subset of command outputs gets dropped from the reconstructed final body. Since add_shell_output_content intentionally records both forms, this method should merge them instead of choosing one.

💡 Suggested fix
 def flattened_shell_output(item_id, output_index)
-  output_by_command_index = `@shell_outputs_by_item_id`[item_id]
-  output_by_command_index = `@shell_outputs_by_output_index`[output_index] if output_by_command_index.empty?
+  output_by_command_index = {}
+  output_by_command_index.merge!(`@shell_outputs_by_output_index`[output_index]) unless output_index.nil?
+  output_by_command_index.merge!(`@shell_outputs_by_item_id`[item_id]) if item_id
   return if output_by_command_index.empty?

   output_by_command_index
     .sort_by { |command_index, _| command_index || -1 }
-    .flat_map(&:last)
+    .flat_map { |_, output| Array(RubyLLM::Utils.deep_dup(output)) }
 end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/ruby_llm/providers/openai_responses/streaming.rb` around lines 150 - 158,
flattened_shell_output currently treats `@shell_outputs_by_output_index` as a
fallback and can drop entries that exist only in one bucket; change
flattened_shell_output to merge the two buckets instead: retrieve the hash for
item_id from `@shell_outputs_by_item_id` and the hash for output_index from
`@shell_outputs_by_output_index`, combine them by unioning their keys and
concatenating the arrays for any matching command_index so no outputs are lost,
then sort the merged hash by command_index and flat_map the values to produce
the final flattened array (use the existing method name flattened_shell_output
and variables `@shell_outputs_by_item_id` / `@shell_outputs_by_output_index` to
locate the code).

AlexVPopov and others added 2 commits April 16, 2026 16:11
Parse local Responses API shell_call output as executable RubyLLM tool calls, including streamed completions and nil shell environments when a local shell tool is declared.

Serialize executor results as shell_call_output continuations with previous_response_id so function and local shell tool results continue incrementally without resending prior conversation state. Keep local_shell_executor params out of API payloads for both symbol and string keys.

Preserve Responses function_call_output string values when regular tools return Content::Raw, and add lifecycle, streaming, and regression coverage plus README guidance for local shell executors.
Support local shell Responses lifecycle
@cursor
Copy link
Copy Markdown

cursor Bot commented Apr 16, 2026

PR Summary

Medium Risk
Touches core request/response formatting and streaming reconstruction, which could affect tool execution and message chaining across transports; changes are well-covered by expanded specs but have broad surface area.

Overview
Preserves shell tool results in streaming and WebSocket flows. Streaming now accumulates response.output_item.done/shell events and rebuilds a final raw response body so returned messages keep shell_call/shell_call_output metadata (commands, environment, status, outputs) instead of losing it at completion.

Adds local shell execution support. Introduces LocalShellExecutor + LocalShellToolCall to run environment.type: local shell calls through a user-supplied local_shell_executor, ensures this executor param is never forwarded to the API, and updates payload formatting/continuation logic (including system→developer input messages and incremental tool-only continuations).

Improves shell result parsing and bumps version. parse_shell_call_results now joins shell_call with shell_call_output by call_id and adds parse_shell_call_results_from_message; extensive new specs/cassettes cover streaming preservation, multi-call/multi-command cases, and local shell lifecycle; version bumped to 0.5.3 and provider slug changed from symbol to string (fixing a sorting crash).

Reviewed by Cursor Bugbot for commit b3dbf1b. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 8657df1. Configure here.


message = accumulator.to_message(nil)
raw_response = completed_response.build_response(nil)
message = accumulator.to_message(raw_response)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WebSocket streaming misses local shell tool call detection

High Severity

The WebSocket accumulate_response directly calls accumulator.to_message(raw_response), bypassing the message_from_stream logic that the HTTP streaming path uses. Since build_chunk only handles function_call items (not shell_call), the StreamAccumulator never accumulates local shell tool calls. The HTTP path works around this by first trying Chat.parse_completion_response(raw_response) which properly detects shell calls. Without equivalent logic, WebSocket streaming will silently fail to trigger local shell execution — the returned message won't have tool_calls, so the chat loop won't invoke the executor.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 8657df1. Configure here.

AlexVPopov and others added 2 commits April 21, 2026 13:08
OpenAI's Responses API treats the top-level instructions parameter as request-local when a request is chained with previous_response_id. This provider is intentionally stateful and relies on previous_response_id for follow-up requests, including tool-result continuation calls, so sending RubyLLM with_instructions content only through top-level instructions made those high-authority instructions fragile across the response chain.

Match the core ruby_llm OpenAI provider more closely by leaving system messages in the normal input list and relying on format_role(:system) to render them as developer messages. Initial requests now persist those instructions as part of the conversation state, while tool-only continuation requests still detect trailing tool results from the non-system messages and submit only the incremental function or shell outputs.

Update the focused specs to assert that system instructions become developer input messages, top-level instructions is omitted from chat payloads, multiple system messages remain ordered, and local shell continuations keep using previous_response_id without resending instructions or non-tool input.
Preserve instructions in Responses API state
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants