Skip to content

Conversation

@aditya-arolkar-swe
Copy link
Contributor

@aditya-arolkar-swe aditya-arolkar-swe commented Oct 6, 2025

Description

Generator change to yield this output: cohere-ai/cohere-python#700

Fixes FER-6826

Changes Made

  • Hard forked httpx_sse, moved into core_utilities, made fixes to line parsing bugs and improved idiomatic class definition.
  • updated parser to: copy http_sse code into output, use new in-house SSE EventStream class and better handle failed SSE parsing

Testing

  • Unit tests added/updated
  • Manual testing completed

Comment on lines 48 to 53
# Process any remaining data in buffer
if buffer.strip():
line = buffer.rstrip('\r')
sse = decoder.decode(line)
if sse is not None:
yield sse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buffer processing logic at the end doesn't handle multi-line content correctly. If the final buffer contains multiple lines without a newline separator (e.g., data received with \r but no \n), only the entire buffer will be processed as a single line. This could cause event data to be incorrectly parsed.

Consider modifying this section to handle potential multi-line content in the remaining buffer, perhaps by splitting on \r characters and processing each line individually, similar to how the main loop processes lines.

Suggested change
# Process any remaining data in buffer
if buffer.strip():
line = buffer.rstrip('\r')
sse = decoder.decode(line)
if sse is not None:
yield sse
# Process any remaining data in buffer
if buffer.strip():
# Split the buffer by \r and process each line
for line in buffer.split('\r'):
if line.strip():
sse = decoder.decode(line)
if sse is not None:
yield sse

Spotted by Diamond

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is wrong, and the SSE spec is very explicit about the double newline splitting events

Comment on lines 48 to 53
# Process any remaining data in buffer
if buffer.strip():
line = buffer.rstrip('\r')
sse = decoder.decode(line)
if sse is not None:
yield sse
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buffer processing logic at the end of the function doesn't properly handle incomplete SSE events. When the stream ends with data that doesn't include a final newline, the code attempts to process it as a complete line, but doesn't signal to the decoder that this is the end of the stream.

Consider adding a final empty line decode after processing any remaining buffer content:

# Process any remaining data in buffer
if buffer.strip():
    line = buffer.rstrip('\r')
    sse = decoder.decode(line)
    if sse is not None:
        yield sse
    
    # Signal end of stream to finalize any pending event
    final_sse = decoder.decode('')
    if final_sse is not None:
        yield final_sse

This ensures that any partial event in the buffer gets properly finalized and emitted.

Suggested change
# Process any remaining data in buffer
if buffer.strip():
line = buffer.rstrip('\r')
sse = decoder.decode(line)
if sse is not None:
yield sse
# Process any remaining data in buffer
if buffer.strip():
line = buffer.rstrip('\r')
sse = decoder.decode(line)
if sse is not None:
yield sse
# Signal end of stream to finalize any pending event
final_sse = decoder.decode('')
if final_sse is not None:
yield final_sse

Spotted by Diamond

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should catch this one with a test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually intentional. If the SSE server doesn't send a final chunk that closes out an SSE event. we don't want to force the processing of an event, which is what the graphite suggested logic would do:

            # Signal end of stream to finalize any pending event
            final_sse = decoder.decode('')

The current version would be a stricter way to handle SSE chunks, where the server must end on /n/n to close out events or our SDK won't process it.

buffer += text_chunk

# Process complete lines
while "\n" in buffer:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you test this with a multiline event (like a JSON blob containing a content field with extra lines)? This doesn't appear to handle that to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it out with Cohere's bug. The extra lines in the JSON blob must be escaped ("\n\n") and this works fine. If they are regular double new lines ("\n\n"), we will interpret it as a new event.

- version: 4.31.0
changelogEntry:
- summary: |
Eliminated SSE dependency "httpx-sse" by hard-forking the code into the generator
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have it anchored in the user-facing outcome more? I also don't love "hard-forking" language in the changelog. I think the points we are trying to land are:

  • Fix SSE handling of events longer than or containing escaped newlines
  • Remove external dependency on httpx-sse

@fern-support fern-support changed the title feat(python): elminiate SSE dependency 'httpx_sse' by hard-forking into core_utilities feat(python): eliminate SSE dependency 'httpx_sse' by hard-forking into core_utilities Oct 8, 2025
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we could include this file (or a version of it) like the other as-is unit test files and generate it for all the customers who have SSE included in their APIs.

I have a moderate preference for doing that, but regardless this looks good :)

@aditya-arolkar-swe aditya-arolkar-swe merged commit 00901b8 into main Oct 9, 2025
130 of 131 checks passed
@aditya-arolkar-swe aditya-arolkar-swe deleted the aa/in-house-sse-processing branch October 9, 2025 13:03
kennyderek pushed a commit that referenced this pull request Oct 14, 2025
…to core_utilities (#9784)

* Hard forked httpx_sse, moved into core_utilities, made fixes to line parsing bugs and improved idiomatic class definition.

* updated parser to: copy http_sse code into output, use new in-house SSE EventStream object and better handle failed SSE parsing

* updated changelog

* Automated update of seed files

* skip terminator sse events

* typing fixes

* Fix formatting issues: remove trailing whitespace and apply ruff formatting

* Automated update of seed files

* improved changelog language

* SSE Support arbitrary charsets from the headers - defaulting to utf-8

* added SSE unit tests

* properly copy core/http_sse to src/.../core/http_sse

* better typing in test_http_sse

* formatting

* refactored 'server-sent-event-examples' output (seed gen was broken), and added a test file 'tests/utils/test_sse_streaming.py'

* Hard forked httpx_sse, moved into core_utilities, made fixes to line parsing bugs and improved idiomatic class definition.

* updated parser to: copy http_sse code into output, use new in-house SSE EventStream object and better handle failed SSE parsing

* updated changelog

* skip terminator sse events

* Automated update of seed files

* typing fixes

* Fix formatting issues: remove trailing whitespace and apply ruff formatting

* improved changelog language

* Automated update of seed files

* SSE Support arbitrary charsets from the headers - defaulting to utf-8

* added SSE unit tests

* properly copy core/http_sse to src/.../core/http_sse

* better typing in test_http_sse

* formatting

* refactored 'server-sent-event-examples' output (seed gen was broken), and added a test file 'tests/utils/test_sse_streaming.py'

* Updated AsyncGenerator typing

* Properly generator stream terminator as a string i.e. was returning [[DONE]] before instead of '[[DONE]]'

* updated http_sse's __init__.py generation

* manual seed updates to fixture 'server-sent-event-examples'

* Automated update of seed files

* updated sse tests

* formatting fixes

---------

Co-authored-by: aditya-arolkar-swe <[email protected]>
Co-authored-by: fern-support <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants