Skip to content

Minor fixes#475

Closed
jlamypoirier wants to merge 5 commits intomainfrom
jlp_batch_fixes
Closed

Minor fixes#475
jlamypoirier wants to merge 5 commits intomainfrom
jlp_batch_fixes

Conversation

@jlamypoirier
Copy link
Collaborator

✨ Description

@jlamypoirier jlamypoirier marked this pull request as ready for review March 24, 2026 05:22
@jlamypoirier
Copy link
Collaborator Author

One test failure left, streaming tests randomly fail with redis.exceptions.InvalidResponse: Protocol Error: b'_' ( https://github.com/ServiceNow/Fast-LLM/actions/runs/23367979074/job/67985797452#step:5:3625), and I'm not able to track down the cause. @bigximik have you encountered this?

fakeredis 2.34 introduced Resp3Writer hardcoded for all TCP connections
regardless of protocol negotiation. When XREADGROUP BLOCK times out on
an empty stream, Resp3Writer.dump(None) sends RESP3 null (b'_\r\n').
The redis-py RESP2 parser (used by default) raises Protocol Error: b'_'.

Fix: monkey-patch TCPFakeRequestHandler.setup in fake_redis_server() to
replace Resp3Writer with Resp2Writer, restoring correct RESP2 null
encoding (b'*-1\r\n') for blocking timeouts. The patch is guarded on
the presence of Resp3Writer (2.34+ only) and raises explicitly if
Resp2Writer is missing so future breakage is immediately diagnosable.
@bigximik
Copy link
Collaborator

I with Calude code seem to have found the problem:

Root cause

fakeredis 2.34.0 changed its TCP server to always use Resp3Writer for all connections,
regardless of whether the client negotiated RESP3 via HELLO 3. When a blocking
XREADGROUP BLOCK N call times out (stream is empty for N ms), Resp3Writer.dump(None)
sends the RESP3 null byte _\r\n. redis-py's default RESP2 parser does not recognize
_ and raises InvalidResponse: Protocol Error: b'_'.

The failure is intermittent because it only triggers when the stream is empty for the
full block duration (1 second). In practice this happens at test startup, when the
consumer's first xreadgroup fires before the producer thread has added its first message
— a timing race that is more likely under CI load.

Fix

In fake_redis_server(), monkey-patch TCPFakeRequestHandler.setup to replace
Resp3Writer with Resp2Writer, restoring correct RESP2 null encoding (*-1\r\n) for
blocking timeouts. The patch is guarded on the presence of Resp3Writer (fakeredis ≥ 2.34
only), and raises explicitly with a diagnostic message if Resp2Writer is absent — so any
future fakeredis internal refactor that breaks this workaround fails loudly rather than
silently.

Future work: migrate streaming to protocol=3

The underlying issue is that fast_llm/data/dataset/streaming.py creates its Redis client
with the default protocol=2. Migrating to protocol=3 (redis.Redis(..., protocol=3))
would be the proper long-term fix:

  • RESP3 is supported by Redis 6.0+ (2020); we already require Redis 7+
  • redis-py handles RESP3 null (_) correctly, so the fakeredis mismatch disappears entirely
  • The one breaking change: XREADGROUP results become a dict instead of a list-of-pairs,
    so the message iteration loop in streaming.py would need updating:
    for stream_key, messages_ in messagesfor stream_key, messages_ in messages.items()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants