Preserve shell tool results in streaming#1
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds streaming SSE reconstruction (StreamRawResponse, CompletedResponseAccumulator, and public stream_response), feeds the same completion accumulator into WebSocket accumulation, enhances shell-call result parsing to merge items by call_id with parse_shell_call_results_from_message, introduces a LocalShellExecutor and chat integration for local shell tool execution, and bumps slug/version to 0.5.3. Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant StreamResp as stream_response
participant Connection as Connection/SSE
participant SA as StreamAccumulator
participant CRA as CompletedResponseAccumulator
participant Msg as Message
Client->>StreamResp: call stream_response(connection, payload, &block)
StreamResp->>Connection: POST /v1/responses (stream=true)
StreamResp->>SA: init StreamAccumulator(payload)
StreamResp->>CRA: init CompletedResponseAccumulator()
loop for each SSE event
Connection-->>StreamResp: SSE event (data)
StreamResp->>CRA: CRA.update(event_hash)
StreamResp->>SA: SA.update(event_hash)
StreamResp->>Client: yield built chunk (from SA)
end
StreamResp->>CRA: raw = CRA.build_response(nil)
StreamResp->>Msg: message = SA.to_message(raw)
StreamResp->>Msg: assign message.response_id from raw['id']
StreamResp-->>Client: return final message (with reconstructed raw.body)
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@codex review |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/ruby_llm/providers/openai_responses/streaming.rb`:
- Around line 288-298: The methods build_on_data_handler and faraday_1? are
missing and must be implemented to avoid NoMethodError when
apply_stream_on_data_handler calls them; add a private helper
build_on_data_handler that accepts a block and returns a Proc or lambda
compatible with Faraday's on_data callback which forwards received chunked data
to handle_stream_event (preserving the yielded parameters expected by
req.options[:on_data]/req.options.on_data), and add a faraday_1? helper that
reliably detects Faraday v1 (e.g., by checking Faraday.const_defined?(:VERSION)
or Gem.loaded_specs['faraday'].version.to_s.start_with?('1') with a safe
respond_to? fallback) so the conditional choosing req.options[:on_data] vs
req.options.on_data works correctly; keep both helpers in the same class/module
as apply_stream_on_data_handler and mark them private.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 25e1b4b5-e0ac-40a2-b6ef-76ea2167e331
📒 Files selected for processing (5)
lib/ruby_llm/providers/openai_responses/built_in_tools.rblib/ruby_llm/providers/openai_responses/streaming.rblib/ruby_llm/providers/openai_responses/web_socket.rbspec/ruby_llm/providers/openai_responses/shell_tool_spec.rbspec/ruby_llm/providers/openai_responses/streaming_spec.rb
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 770c987f5d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
[FIX] slug method should return string instead of symbol to allow Mod…
Return string from slug instead of symbol to match the base Provider contract, fixing ArgumentError when RubyLLM sorts models by provider.
Accumulate final shell done events during Responses API streaming and rebuild the final output payload for the returned message. This preserves shell metadata such as call ids, action settings, environment, and stdout/stderr/outcome for both HTTP and WebSocket streaming. Normalize shell result parsing around the provider-shaped fields and add coverage for streaming preservation, multi-call joining, sync parsing, and function-call regression safety. This keeps built-in shell results inspectable after streamed responses complete.
Accumulate final shell output content by command index instead of overwriting prior entries for the same shell call output item. This preserves stdout, stderr, and outcomes for multi-command shell executions in the final streamed message. Add a regression spec covering a streamed shell call with multiple commands so earlier command results are retained in the rebuilt output payload and parser results.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/ruby_llm/providers/openai_responses/streaming.rb`:
- Around line 150-158: flattened_shell_output currently treats
`@shell_outputs_by_output_index` as a fallback and can drop entries that exist
only in one bucket; change flattened_shell_output to merge the two buckets
instead: retrieve the hash for item_id from `@shell_outputs_by_item_id` and the
hash for output_index from `@shell_outputs_by_output_index`, combine them by
unioning their keys and concatenating the arrays for any matching command_index
so no outputs are lost, then sort the merged hash by command_index and flat_map
the values to produce the final flattened array (use the existing method name
flattened_shell_output and variables `@shell_outputs_by_item_id` /
`@shell_outputs_by_output_index` to locate the code).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 74e5196a-e0e2-4afa-a4b6-6bdf05f35a2d
📒 Files selected for processing (5)
lib/ruby_llm/providers/openai_responses/built_in_tools.rblib/ruby_llm/providers/openai_responses/streaming.rblib/ruby_llm/providers/openai_responses/web_socket.rbspec/ruby_llm/providers/openai_responses/shell_tool_spec.rbspec/ruby_llm/providers/openai_responses/streaming_spec.rb
✅ Files skipped from review due to trivial changes (1)
- spec/ruby_llm/providers/openai_responses/streaming_spec.rb
| def flattened_shell_output(item_id, output_index) | ||
| output_by_command_index = @shell_outputs_by_item_id[item_id] | ||
| output_by_command_index = @shell_outputs_by_output_index[output_index] if output_by_command_index.empty? | ||
| return if output_by_command_index.empty? | ||
|
|
||
| output_by_command_index | ||
| .sort_by { |command_index, _| command_index || -1 } | ||
| .flat_map(&:last) | ||
| end |
There was a problem hiding this comment.
Merge the item_id and output_index buckets before flattening shell output.
Line 152 currently treats output_index as a pure fallback. If some response.shell_call_output_content.done events are keyed only by item_id and others only by output_index, one subset of command outputs gets dropped from the reconstructed final body. Since add_shell_output_content intentionally records both forms, this method should merge them instead of choosing one.
💡 Suggested fix
def flattened_shell_output(item_id, output_index)
- output_by_command_index = `@shell_outputs_by_item_id`[item_id]
- output_by_command_index = `@shell_outputs_by_output_index`[output_index] if output_by_command_index.empty?
+ output_by_command_index = {}
+ output_by_command_index.merge!(`@shell_outputs_by_output_index`[output_index]) unless output_index.nil?
+ output_by_command_index.merge!(`@shell_outputs_by_item_id`[item_id]) if item_id
return if output_by_command_index.empty?
output_by_command_index
.sort_by { |command_index, _| command_index || -1 }
- .flat_map(&:last)
+ .flat_map { |_, output| Array(RubyLLM::Utils.deep_dup(output)) }
end🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/ruby_llm/providers/openai_responses/streaming.rb` around lines 150 - 158,
flattened_shell_output currently treats `@shell_outputs_by_output_index` as a
fallback and can drop entries that exist only in one bucket; change
flattened_shell_output to merge the two buckets instead: retrieve the hash for
item_id from `@shell_outputs_by_item_id` and the hash for output_index from
`@shell_outputs_by_output_index`, combine them by unioning their keys and
concatenating the arrays for any matching command_index so no outputs are lost,
then sort the merged hash by command_index and flat_map the values to produce
the final flattened array (use the existing method name flattened_shell_output
and variables `@shell_outputs_by_item_id` / `@shell_outputs_by_output_index` to
locate the code).
Parse local Responses API shell_call output as executable RubyLLM tool calls, including streamed completions and nil shell environments when a local shell tool is declared. Serialize executor results as shell_call_output continuations with previous_response_id so function and local shell tool results continue incrementally without resending prior conversation state. Keep local_shell_executor params out of API payloads for both symbol and string keys. Preserve Responses function_call_output string values when regular tools return Content::Raw, and add lifecycle, streaming, and regression coverage plus README guidance for local shell executors.
Support local shell Responses lifecycle
PR SummaryMedium Risk Overview Adds local shell execution support. Introduces Improves shell result parsing and bumps version. Reviewed by Cursor Bugbot for commit b3dbf1b. Bugbot is set up for automated code reviews on this repo. Configure here. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 8657df1. Configure here.
|
|
||
| message = accumulator.to_message(nil) | ||
| raw_response = completed_response.build_response(nil) | ||
| message = accumulator.to_message(raw_response) |
There was a problem hiding this comment.
WebSocket streaming misses local shell tool call detection
High Severity
The WebSocket accumulate_response directly calls accumulator.to_message(raw_response), bypassing the message_from_stream logic that the HTTP streaming path uses. Since build_chunk only handles function_call items (not shell_call), the StreamAccumulator never accumulates local shell tool calls. The HTTP path works around this by first trying Chat.parse_completion_response(raw_response) which properly detects shell calls. Without equivalent logic, WebSocket streaming will silently fail to trigger local shell execution — the returned message won't have tool_calls, so the chat loop won't invoke the executor.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 8657df1. Configure here.
OpenAI's Responses API treats the top-level instructions parameter as request-local when a request is chained with previous_response_id. This provider is intentionally stateful and relies on previous_response_id for follow-up requests, including tool-result continuation calls, so sending RubyLLM with_instructions content only through top-level instructions made those high-authority instructions fragile across the response chain. Match the core ruby_llm OpenAI provider more closely by leaving system messages in the normal input list and relying on format_role(:system) to render them as developer messages. Initial requests now persist those instructions as part of the conversation state, while tool-only continuation requests still detect trailing tool results from the non-system messages and submit only the incremental function or shell outputs. Update the focused specs to assert that system instructions become developer input messages, top-level instructions is omitted from chat payloads, multiple system messages remain ordered, and local shell continuations keep using previous_response_id without resending instructions or non-tool input.
Preserve instructions in Responses API state


Accumulate final shell done events during Responses API streaming and rebuild the final output payload for the returned message. This preserves shell metadata such as call ids, action settings, environment, and stdout/stderr/outcome for both HTTP and WebSocket streaming.
Normalize shell result parsing around the provider-shaped fields and add coverage for streaming preservation, multi-call joining, sync parsing, and function-call regression safety. This keeps built-in shell results inspectable after streamed responses complete.
Note
Medium Risk
Touches core streaming/WebSocket accumulation and rewrites how the final
raw.bodyis built, which could affect any downstream consumers of streamed output ordering/shape. Changes are well-covered by new specs but still moderate risk due to protocol/event handling complexity.Overview
Preserves shell tool results after streaming completes. Streaming now accumulates
response.output_item.done/shell command/output events and rebuilds the finalmessage.raw.body['output']soshell_call/shell_call_outputitems (includingcall_id,environment,action.commands, and stdout/stderr/outcome) aren’t lost.Unifies shell result parsing.
BuiltInTools.parse_shell_call_resultsnow joinsshell_callmetadata withshell_call_outputitems bycall_id, and addsparse_shell_call_results_from_messagefor extracting results directly from a finalRubyLLM::Message.Also bumps version to
0.5.3and changes providerslugfrom a symbol to a string to avoid sorting crashes, with expanded test coverage for streaming shell preservation, multi-call/multi-command scenarios, and function-call streaming regression.Written by Cursor Bugbot for commit cd34151. This will update automatically on new commits. Configure here.
Summary by CodeRabbit
New Features
Tests
Chores