Preserve shell tool results in streaming by AlexVPopov · Pull Request #1 · zipchat-ai/ruby_llm-responses_api

AlexVPopov · 2026-03-19T09:35:16Z

Accumulate final shell done events during Responses API streaming and rebuild the final output payload for the returned message. This preserves shell metadata such as call ids, action settings, environment, and stdout/stderr/outcome for both HTTP and WebSocket streaming.

Normalize shell result parsing around the provider-shaped fields and add coverage for streaming preservation, multi-call joining, sync parsing, and function-call regression safety. This keeps built-in shell results inspectable after streamed responses complete.

Note

Medium Risk
Touches core streaming/WebSocket accumulation and rewrites how the final raw.body is built, which could affect any downstream consumers of streamed output ordering/shape. Changes are well-covered by new specs but still moderate risk due to protocol/event handling complexity.

Overview
Preserves shell tool results after streaming completes. Streaming now accumulates response.output_item.done/shell command/output events and rebuilds the final message.raw.body['output'] so shell_call/shell_call_output items (including call_id, environment, action.commands, and stdout/stderr/outcome) aren’t lost.

Unifies shell result parsing. BuiltInTools.parse_shell_call_results now joins shell_call metadata with shell_call_output items by call_id, and adds parse_shell_call_results_from_message for extracting results directly from a final RubyLLM::Message.

Also bumps version to 0.5.3 and changes provider slug from a symbol to a string to avoid sorting crashes, with expanded test coverage for streaming shell preservation, multi-call/multi-command scenarios, and function-call streaming regression.

^{Written by Cursor Bugbot for commit cd34151. This will update automatically on new commits. Configure here.}

Summary by CodeRabbit

New Features
- Aggregate shell-call results by execution ID with combined outputs, preserved ordering, richer status/environment/action, and parseable from final messages
- Improved streaming reconstruction producing complete final payloads and preserved response IDs
- Local shell executor support and local-execution documentation and hooks
- Enhanced continuation and tool-call extraction/formatting for mixed tool and function flows
Tests
- Extensive specs for streaming, multi-shell/multi-command runs, local shell lifecycle, and parsing
Chores
- Bumped version to 0.5.3 and normalized provider slug to a string

coderabbitai · 2026-03-19T09:35:33Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds streaming SSE reconstruction (StreamRawResponse, CompletedResponseAccumulator, and public stream_response), feeds the same completion accumulator into WebSocket accumulation, enhances shell-call result parsing to merge items by call_id with parse_shell_call_results_from_message, introduces a LocalShellExecutor and chat integration for local shell tool execution, and bumps slug/version to 0.5.3.

Changes

Cohort / File(s)	Summary
Shell Call Result Parsing & Tests `lib/ruby_llm/providers/openai_responses/built_in_tools.rb`, `spec/ruby_llm/providers/openai_responses/shell_tool_spec.rb`	`parse_shell_call_results` now coerces inputs, computes first-seen `call_id` ordering, groups items by `call_id`, and merges `shell_call` metadata with concatenated `shell_call_output.output` (deep-copying `environment`/`action`, setting `id`/`call_id`/`status`). Added `parse_shell_call_results_from_message(message)` and updated specs to expect the merged result shape and `environment` structure.
Streaming Response Handling & Tests `lib/ruby_llm/providers/openai_responses/streaming.rb`, `lib/ruby_llm/providers/openai_responses/web_socket.rb`, `spec/ruby_llm/providers/openai_responses/streaming_spec.rb`	Added `StreamRawResponse` and `CompletedResponseAccumulator`. Implemented `stream_response(...)` that wires Faraday SSE `on_data`, updates both StreamAccumulator and CompletedResponseAccumulator per event, reconstructs final `raw` response and message, assigns `message.response_id`, and yields streaming chunks. WebSocket accumulation now builds and passes a completed raw response into the accumulator. New tests validate reconstruction, ordering, and tool/shell output assembly.
Local Shell Executor & Chat Integration `lib/ruby_llm/providers/openai_responses/local_shell_executor.rb`, `lib/ruby_llm/providers/openai_responses/chat.rb`, `spec/ruby_llm/providers/openai_responses/local_shell_lifecycle_spec.rb`, `spec/ruby_llm/providers/openai_responses/chat_spec.rb`	Introduced `LocalShellExecutor` (tool wrapper) and `LocalShellToolCall`; added `ChatExtension` to intercept/dispatch `openai_responses_local_shell` tool calls and use `local_shell_executor` from params. `chat.rb` input/formatting/`extract_tool_calls` updated to serialize tool results, include continuation input messages, and produce `LocalShellToolCall` for local shell calls. New lifecycle tests cover continuation behavior, streaming local execution, error cases, scoping, and environment handling.
Provider Identity, Versioning & Packaging `lib/ruby_llm/providers/openai_responses.rb`, `lib/rubyllm_responses_api.rb`, `ruby_llm-responses_api.gemspec`, `spec/ruby_llm/providers/openai_responses_spec.rb`, `CHANGELOG.md`	Changed provider `slug` from Symbol to String (`:openai_responses` → `"openai_responses"`), bumped `RubyLLM::ResponsesAPI::VERSION` and gemspec to `0.5.3`, updated slug spec, and added CHANGELOG entry. Also added `require_relative` for `local_shell_executor`.
Docs / Examples `README.md`	Replaced local execution example to use `environment_type: 'local'` and documented the `local_shell_executor` hook, its expected interface, and an end-to-end wiring example.
VCR Fixtures `spec/cassettes/function_calling_*.yml`	Updated recorded request bodies to reflect use of `*_output` continuation-only payloads and added `previous_response_id` where applicable.
Misc Tests `spec/ruby_llm/providers/openai_responses/streaming_spec.rb`, other spec additions	Added/extended specs for streaming shell/function reconstruction, multi-call ordering, multi-command outputs, and function-argument streaming.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant StreamResp as stream_response
    participant Connection as Connection/SSE
    participant SA as StreamAccumulator
    participant CRA as CompletedResponseAccumulator
    participant Msg as Message

    Client->>StreamResp: call stream_response(connection, payload, &block)
    StreamResp->>Connection: POST /v1/responses (stream=true)
    StreamResp->>SA: init StreamAccumulator(payload)
    StreamResp->>CRA: init CompletedResponseAccumulator()
    loop for each SSE event
        Connection-->>StreamResp: SSE event (data)
        StreamResp->>CRA: CRA.update(event_hash)
        StreamResp->>SA: SA.update(event_hash)
        StreamResp->>Client: yield built chunk (from SA)
    end
    StreamResp->>CRA: raw = CRA.build_response(nil)
    StreamResp->>Msg: message = SA.to_message(raw)
    StreamResp->>Msg: assign message.response_id from raw['id']
    StreamResp-->>Client: return final message (with reconstructed raw.body)

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 5.41% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Preserve shell tool results in streaming' directly describes the main objective stated in the PR: accumulating and rebuilding final shell tool results during Responses API streaming (HTTP SSE and WebSocket) so shell outputs remain inspectable after completion.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch zipchat

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

AlexVPopov · 2026-03-19T09:35:37Z

@codex review

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/ruby_llm/providers/openai_responses/streaming.rb`:
- Around line 288-298: The methods build_on_data_handler and faraday_1? are
missing and must be implemented to avoid NoMethodError when
apply_stream_on_data_handler calls them; add a private helper
build_on_data_handler that accepts a block and returns a Proc or lambda
compatible with Faraday's on_data callback which forwards received chunked data
to handle_stream_event (preserving the yielded parameters expected by
req.options[:on_data]/req.options.on_data), and add a faraday_1? helper that
reliably detects Faraday v1 (e.g., by checking Faraday.const_defined?(:VERSION)
or Gem.loaded_specs['faraday'].version.to_s.start_with?('1') with a safe
respond_to? fallback) so the conditional choosing req.options[:on_data] vs
req.options.on_data works correctly; keep both helpers in the same class/module
as apply_stream_on_data_handler and mark them private.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 25e1b4b5-e0ac-40a2-b6ef-76ea2167e331

📥 Commits

Reviewing files that changed from the base of the PR and between 042b35d and 770c987.

📒 Files selected for processing (5)

lib/ruby_llm/providers/openai_responses/built_in_tools.rb
lib/ruby_llm/providers/openai_responses/streaming.rb
lib/ruby_llm/providers/openai_responses/web_socket.rb
spec/ruby_llm/providers/openai_responses/shell_tool_spec.rb
spec/ruby_llm/providers/openai_responses/streaming_spec.rb

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 770c987f5d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…el.refresh!

[FIX] slug method should return string instead of symbol to allow Mod…

Return string from slug instead of symbol to match the base Provider contract, fixing ArgumentError when RubyLLM sorts models by provider.

Accumulate final shell done events during Responses API streaming and rebuild the final output payload for the returned message. This preserves shell metadata such as call ids, action settings, environment, and stdout/stderr/outcome for both HTTP and WebSocket streaming. Normalize shell result parsing around the provider-shaped fields and add coverage for streaming preservation, multi-call joining, sync parsing, and function-call regression safety. This keeps built-in shell results inspectable after streamed responses complete.

Accumulate final shell output content by command index instead of overwriting prior entries for the same shell call output item. This preserves stdout, stderr, and outcomes for multi-command shell executions in the final streamed message. Add a regression spec covering a streamed shell call with multiple commands so earlier command results are retained in the rebuilt output payload and parser results.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/ruby_llm/providers/openai_responses/streaming.rb`:
- Around line 150-158: flattened_shell_output currently treats
`@shell_outputs_by_output_index` as a fallback and can drop entries that exist
only in one bucket; change flattened_shell_output to merge the two buckets
instead: retrieve the hash for item_id from `@shell_outputs_by_item_id` and the
hash for output_index from `@shell_outputs_by_output_index`, combine them by
unioning their keys and concatenating the arrays for any matching command_index
so no outputs are lost, then sort the merged hash by command_index and flat_map
the values to produce the final flattened array (use the existing method name
flattened_shell_output and variables `@shell_outputs_by_item_id` /
`@shell_outputs_by_output_index` to locate the code).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 74e5196a-e0e2-4afa-a4b6-6bdf05f35a2d

📥 Commits

Reviewing files that changed from the base of the PR and between 6b7e6dc and cd34151.

📒 Files selected for processing (5)

lib/ruby_llm/providers/openai_responses/built_in_tools.rb
lib/ruby_llm/providers/openai_responses/streaming.rb
lib/ruby_llm/providers/openai_responses/web_socket.rb
spec/ruby_llm/providers/openai_responses/shell_tool_spec.rb
spec/ruby_llm/providers/openai_responses/streaming_spec.rb

✅ Files skipped from review due to trivial changes (1)

spec/ruby_llm/providers/openai_responses/streaming_spec.rb

coderabbitai · 2026-03-31T07:09:58Z

+          def flattened_shell_output(item_id, output_index)
+            output_by_command_index = @shell_outputs_by_item_id[item_id]
+            output_by_command_index = @shell_outputs_by_output_index[output_index] if output_by_command_index.empty?
+            return if output_by_command_index.empty?
+
+            output_by_command_index
+              .sort_by { |command_index, _| command_index || -1 }
+              .flat_map(&:last)
+          end


⚠️ Potential issue | 🟠 Major

Merge the item_id and output_index buckets before flattening shell output.

Line 152 currently treats output_index as a pure fallback. If some response.shell_call_output_content.done events are keyed only by item_id and others only by output_index, one subset of command outputs gets dropped from the reconstructed final body. Since add_shell_output_content intentionally records both forms, this method should merge them instead of choosing one.

💡 Suggested fix

def flattened_shell_output(item_id, output_index) - output_by_command_index = `@shell_outputs_by_item_id`[item_id] - output_by_command_index = `@shell_outputs_by_output_index`[output_index] if output_by_command_index.empty? + output_by_command_index = {} + output_by_command_index.merge!(`@shell_outputs_by_output_index`[output_index]) unless output_index.nil? + output_by_command_index.merge!(`@shell_outputs_by_item_id`[item_id]) if item_id return if output_by_command_index.empty? output_by_command_index .sort_by { |command_index, _| command_index || -1 } - .flat_map(&:last) + .flat_map { |_, output| Array(RubyLLM::Utils.deep_dup(output)) } end

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@lib/ruby_llm/providers/openai_responses/streaming.rb` around lines 150 - 158, flattened_shell_output currently treats `@shell_outputs_by_output_index` as a fallback and can drop entries that exist only in one bucket; change flattened_shell_output to merge the two buckets instead: retrieve the hash for item_id from `@shell_outputs_by_item_id` and the hash for output_index from `@shell_outputs_by_output_index`, combine them by unioning their keys and concatenating the arrays for any matching command_index so no outputs are lost, then sort the merged hash by command_index and flat_map the values to produce the final flattened array (use the existing method name flattened_shell_output and variables `@shell_outputs_by_item_id` / `@shell_outputs_by_output_index` to locate the code).

Parse local Responses API shell_call output as executable RubyLLM tool calls, including streamed completions and nil shell environments when a local shell tool is declared. Serialize executor results as shell_call_output continuations with previous_response_id so function and local shell tool results continue incrementally without resending prior conversation state. Keep local_shell_executor params out of API payloads for both symbol and string keys. Preserve Responses function_call_output string values when regular tools return Content::Raw, and add lifecycle, streaming, and regression coverage plus README guidance for local shell executors.

Support local shell Responses lifecycle

cursor · 2026-04-16T13:23:28Z

PR Summary

Medium Risk
Touches core request/response formatting and streaming reconstruction, which could affect tool execution and message chaining across transports; changes are well-covered by expanded specs but have broad surface area.

Overview
Preserves shell tool results in streaming and WebSocket flows. Streaming now accumulates response.output_item.done/shell events and rebuilds a final raw response body so returned messages keep shell_call/shell_call_output metadata (commands, environment, status, outputs) instead of losing it at completion.

Adds local shell execution support. Introduces LocalShellExecutor + LocalShellToolCall to run environment.type: local shell calls through a user-supplied local_shell_executor, ensures this executor param is never forwarded to the API, and updates payload formatting/continuation logic (including system→developer input messages and incremental tool-only continuations).

Improves shell result parsing and bumps version. parse_shell_call_results now joins shell_call with shell_call_output by call_id and adds parse_shell_call_results_from_message; extensive new specs/cassettes cover streaming preservation, multi-call/multi-command cases, and local shell lifecycle; version bumped to 0.5.3 and provider slug changed from symbol to string (fixing a sorting crash).

^{Reviewed by Cursor Bugbot for commit b3dbf1b. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 8657df1. Configure here.}

cursor · 2026-04-16T13:29:05Z


-          message = accumulator.to_message(nil)
+          raw_response = completed_response.build_response(nil)
+          message = accumulator.to_message(raw_response)


WebSocket streaming misses local shell tool call detection

High Severity

The WebSocket accumulate_response directly calls accumulator.to_message(raw_response), bypassing the message_from_stream logic that the HTTP streaming path uses. Since build_chunk only handles function_call items (not shell_call), the StreamAccumulator never accumulates local shell tool calls. The HTTP path works around this by first trying Chat.parse_completion_response(raw_response) which properly detects shell calls. Without equivalent logic, WebSocket streaming will silently fail to trigger local shell execution — the returned message won't have tool_calls, so the chat loop won't invoke the executor.

Additional Locations (1)

lib/ruby_llm/providers/openai_responses/streaming.rb#L189-L195

^{Reviewed by Cursor Bugbot for commit 8657df1. Configure here.}

OpenAI's Responses API treats the top-level instructions parameter as request-local when a request is chained with previous_response_id. This provider is intentionally stateful and relies on previous_response_id for follow-up requests, including tool-result continuation calls, so sending RubyLLM with_instructions content only through top-level instructions made those high-authority instructions fragile across the response chain. Match the core ruby_llm OpenAI provider more closely by leaving system messages in the normal input list and relying on format_role(:system) to render them as developer messages. Initial requests now persist those instructions as part of the conversation state, while tool-only continuation requests still detect trailing tool results from the non-system messages and submit only the incremental function or shell outputs. Update the focused specs to assert that system instructions become developer input messages, top-level instructions is omitted from chat payloads, multiple system messages remain ordered, and local shell continuations keep using previous_response_id without resending instructions or non-tool input.

Preserve instructions in Responses API state

coderabbitai Bot reviewed Mar 19, 2026

View reviewed changes

Comment thread lib/ruby_llm/providers/openai_responses/streaming.rb

chatgpt-codex-connector Bot reviewed Mar 19, 2026

View reviewed changes

Comment thread lib/ruby_llm/providers/openai_responses/streaming.rb Outdated

noelblaschke and others added 5 commits March 26, 2026 16:54

[FIX] slug method should return string instead of symbol to allow Mod…

d11cd01

…el.refresh!

Merge pull request khasinski#6 from noelblaschke/main

8cdb958

[FIX] slug method should return string instead of symbol to allow Mod…

v0.5.3: Fix slug return type for Model.refresh! compatibility

a3892b7

Return string from slug instead of symbol to match the base Provider contract, fixing ArgumentError when RubyLLM sorts models by provider.

AlexVPopov force-pushed the zipchat branch from 6b7e6dc to cd34151 Compare March 31, 2026 06:58

coderabbitai Bot reviewed Mar 31, 2026

View reviewed changes

AlexVPopov and others added 2 commits April 16, 2026 16:11

Merge pull request #2 from zipchat-ai/local-shell

8657df1

Support local shell Responses lifecycle

cursor Bot reviewed Apr 16, 2026

View reviewed changes

AlexVPopov and others added 2 commits April 21, 2026 13:08

Merge pull request #3 from zipchat-ai/system-instructions

b3dbf1b

Preserve instructions in Responses API state

Conversation

AlexVPopov commented Mar 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

AlexVPopov commented Mar 19, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 16, 2026

Choose a reason for hiding this comment

WebSocket streaming misses local shell tool call detection

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AlexVPopov commented Mar 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 19, 2026 •

edited

Loading

cursor Bot commented Apr 16, 2026 •

edited

Loading