fix(weave): make ObjectRecord subscriptable for str.format templates#6896
fix(weave): make ObjectRecord subscriptable for str.format templates#6896rgao-coreweave wants to merge 2 commits into
Conversation
|
Preview this PR with FeatureBee: https://beta.wandb.ai/?betaVersion=ce2c2fc299bfca4d9d5195e51f54e995bbe03471 |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
HiveMind Sessions4 sessions · 45m · $15
View all sessions in HiveMind → Run |
ObjectRecord (the wrapper Weave produces when deserializing a user
object from the trace database) only supported attribute access. Python's
str.format resolves "{x[field]}" via __getitem__, so any prompt template
using subscript syntax against an ObjectRecord raised
"TypeError: 'ObjectRecord' object is not subscriptable" — observed in
weave-worker's scoring_worker._format_scoring_prompt, and reachable via
LLMAsAJudgeScorer.score and MessagesPrompt.format.
Add __getitem__ delegating to self.__dict__ so subscript access mirrors
attribute access. Purely additive — no existing path changes behavior.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Collapse four tests into one regression test that covers attribute-like subscript lookup, KeyError on missing keys, the production str.format scenario, and nested ObjectRecord subscripting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
c9dad8a to
021bfc1
Compare
| # an ObjectRecord raises `TypeError: 'ObjectRecord' object is not | ||
| # subscriptable` — for example, when a scoring prompt formats over a | ||
| # call's deserialized inputs/output. | ||
| return self.__dict__[key] |
There was a problem hiding this comment.
This does not look correct to me.
The intent of ObjectRecord was to provide the facade to an object uploaded to weave. Consider the failure mode:
class Something(BaseModel):
a: int
s = Something(a=1)
s.a # OK
s["a"] # ERROR
ref = weave.publish(s)
s2 = ref.get()
s2.a # OK
s2["a"] # ERRORBut after your change:
s2["a"] # OKThere was a problem hiding this comment.
Fair concern about the facade — s["a"] working on the proxy when it doesn't on the original is real asymmetry. Wanted to lay out the full picture so we can pick the right level.
Why this surfaced. ~734 errors/week in Datadog from weave-trace's scoring_worker._format_scoring_prompt, where scoring_prompt.format(**inputs_with_defaults) raises TypeError: 'ObjectRecord' object is not subscriptable whenever a logged input/output round-tripped as a pydantic model or dataclass (i.e. became an ObjectRecord). The same pattern exists in the SDK at weave/scorers/llm_as_a_judge_scorer.py:66 and weave/prompt/prompt.py:146, and in any user code doing template.format(**call_inputs_or_output).
Two ways to fix it:
(A) Class-level — what this PR does. Add __getitem__ to ObjectRecord delegating to self.__dict__. One-line behavioral addition; covers every callsite uniformly.
(B) Callsite-level. Revert this PR and add a string.Formatter subclass that makes {x[field]} fall back to getattr when subscripting raises TypeError. Apply at the SDK callsites. ObjectRecord stays a pure facade. Sketch:
import _string, string
class _TemplateFormatter(string.Formatter):
def get_field(self, field_name, args, kwargs):
first, rest = _string.formatter_field_name_split(field_name)
obj = self.get_value(first, args, kwargs)
for is_attr, key in rest:
if is_attr:
obj = getattr(obj, key)
else:
try:
obj = obj[key]
except TypeError:
obj = getattr(obj, str(key))
return obj, firstTradeoffs:
| (A) class-level | (B) callsite-level | |
|---|---|---|
weave-trace worker MessagesPrompt.format path |
fixed via SDK | fixed if worker imports new formatter |
weave-trace worker bare scoring_prompt.format(**inputs) path (str template, scoring_worker.py:1123) |
fixed | not fixed — needs separate worker patch |
User code doing template.format(**call_inputs) |
fixed | not fixed |
| ObjectRecord facade purity | breaks (by one bit) | preserved |
| Lines of code | ~9 | ~30 + tests, plus worker patch |
Depends on private _string module |
no | yes (stable but private) |
On facade purity. The proxy is already lossy/asymmetric in both directions — weave_isinstance(obj, cls) recognizes an ObjectRecord by _class_name while plain isinstance(obj, cls) does not; ObjectRecord carries helpers (map_values, unwrap) the original lacks; the original's class-specific methods are gone on the proxy. Today the asymmetry runs both ways; this PR adds one more bit on the "proxy is more permissive" side.
On regression risk. Strictly additive — ObjectRecord had no __getitem__ previously, so no path that worked before changes behavior. I grepped: no try/except TypeError patterns around subscripting it, no Mapping duck-checks that would now wrongly include it (the ABC needs keys() too, which we don't add).
My recommendation: (A). The worker has a bare str.format() callsite on a raw string template that (B) can't reach without a separate worker patch, and the bug surface includes user code doing the same. Fixing on ObjectRecord is the single point that catches all of them.
Happy to switch to (B) if you'd rather, but flagging that it'll need a follow-up worker change too.
There was a problem hiding this comment.
Pulled in a second perspective via /talk-to-tim — a tool that emulates @tssweeney's engineering reasoning — to pressure-test my own framing before you respond. Sharing his read since it sharpens the argument better than mine did:
ObjectRecord is not a facade. it is a deserialized stand-in. it already has:
_class_name,_basesthat the original doesn'tmap_values,unwrapthat the original doesn'tweave_isinstancereturning true butisinstancereturning false- no
.model_dump()from the originalso the "facade purity" ship sailed a long time ago. the one bit of asymmetry [you're] pointing at is real but it's the least important one — it's the one that makes the user's reasonable code work.
and frankly —
s["a"]failing on a pydantic model is a python limitation, not a feature we should be faithfully reproducing on the proxy. nobody is writingtry: s["a"] except TypeErrorto detect pydantic vs dict. that pattern doesn't exist in real code.
And on the alternatives — including a third option I floated to him (calling unwrap() at the SDK callsite to convert ObjectRecord → dict before formatting):
every callsite that does
template.format(**inputs)is now broken until someone remembers to route it through_TemplateFormatter. that's not a fix, that's a maintenance tax forever. and it leaks knowledge of ObjectRecord's quirks into every place that formats strings. that is exactly the "concept living in the wrong layer" thing — string formatting code should not need to know about weave's deserialization model.[the unwrap-at-callsite option] is a regression, not a fix.
{output.answer}worked before, breaks after. ObjectRecord's whole point is that it preserves attribute access for pydantic/dataclass roundtrips. Stripping it back to a dict at the callsite means anyone who wrote{output.answer}— which is the more natural Python idiom for objects — silently breaks.
His bottom line is the same as mine: ship the class-level fix. The reframe I'd land on is "ObjectRecord is the round-trip representation of user objects, not a behavioral facade." Under that framing, subscript access for stored attributes is just another way to read a field — there's no contract we owe the user to faithfully reproduce BaseModel.__getitem__ raising TypeError.
One thing he flagged for explicit decision before merge: __contains__ / len() / iter() will start hitting their default Python behavior once __getitem__ exists (e.g. "a" in rec will fall back to iterating). His vote — and mine — is keep it scoped: __getitem__ only, documented as "attribute access via subscript for template formatting," not signaling Mapping-ness. If you want me to add an explicit __contains__ raising or similar, happy to.
|
discussed the fix and agreed with andrew that sdk's facade is not the right place to fix this. we should fix this in the callsite, (scoring worker), filed a ticket for triage. closing the PR |
Background
The actors
ObjectRecord(weave/trace/object_record.py) is Weave's wrapper for "I deserialized a user object from the trace database but I don't have its original class available." When a user logs a call whose input is, say, a custom pydanticBaseModel, the class definition lives in their code — not in Weave's. So Weave stores the attributes, and on retrieval reconstructs a generic stand-in that hangs those attributes off itself viasetattr. You can read fields withobj.answer,obj.score, etc. But it does not behave like a dict — there is no__getitem__.Python's
str.formathas two ways to drill into a value:They look interchangeable. They are not. The second one calls
__getitem__.The collision
The scoring system lets users write prompt templates like:
Then the worker fetches the call's
inputs/outputfrom the trace DB and runs:{"question": "..."}as a plain dict → roundtrips as a dict →inputs["question"]works.ObjectRecord→inputs["question"]callsObjectRecord.__getitem__("question")→ no such method →TypeError: 'ObjectRecord' object is not subscriptable.The user's template is unchanged. The user's intent is unchanged. The thing that changed is the runtime type of the value, which the user has zero control over — it's determined by Weave's serialization layer.
Why it surfaced
This was a regression flagged in Datadog with ~734 occurrences over the past week. First seen ~2 months ago at
ae0963a, dormant, then resurging 5 days ago at62e783a— likely a new scorer rollout or a change in how call inputs are unwrapped before reaching the formatter.Stack trace observed in
weave-worker:The same vulnerable
template.format(**inputs)pattern exists in the public SDK at:weave/scorers/llm_as_a_judge_scorer.py:66—LLMAsAJudgeScorer.scoreweave/prompt/prompt.py:146—format_message_with_template_vars(used byMessagesPrompt.format)…so this isn't worker-only; any user code path that runs a prompt template over deserialized call inputs/output can hit it.
Fix
Add
__getitem__toObjectRecorddelegating toself.__dict__[key]. Nowobj["field"]behaves the same asobj.field.Why this is the right level:
ObjectRecordhad no__getitem__previously, so no existing code path can change behavior — only previously-broken code starts working.weave-worker's_format_scoring_prompt,LLMAsAJudgeScorer.score, andMessagesPrompt.formatin one place. The worker installs the Weave SDK as a dependency, so on the next SDK release it picks up this fix with no worker-side change.ObjectRecordvalues inside anObjectRecordwork too —{x[user][name]}is two__getitem__calls, both satisfied by the same one-line implementation.ObjectRecord.unwrap()already converts the record back to a plain dict, so treating it dict-like for read-only access aligns with the class's existing posture.Test plan
New file
tests/trace/test_object_record.pywith one regression test,test_object_record_supports_subscript_for_str_format, which fails with the exact productionTypeErrorbefore the patch and passes after. It covers:rec["answer"]returns the stored attribute).KeyError(dict semantics), notTypeError."{out[answer]}".format(out=record).ObjectRecordsubscripting ({x[inner][answer]}) recurses through__getitem__.Regression check on existing tests that touch
ObjectRecord:tests/trace/test_object_record.py(1 new test)tests/trace/test_serialize_encoders.pytests/trace/test_deepcopy.pytests/trace/test_llm_as_a_judge_scorer.pyuvx ruff checkclean🤖 Generated with Claude Code