Skip to content

Conversation

@rlundeen2
Copy link
Contributor

@rlundeen2 rlundeen2 commented Jan 15, 2026

Moving metadata to AttackResult

This PR moves many fields that were part of Message into the AttackResult. Including targeted_harm_categories, converter_identifiers, attack_identifier, scorer_identifier. The old usage still exists but is deprecated. These fields make a lot more sense as part of a conversation/attack and shouldn't be attached to every single message_piece.

Deprecated get_prompt_scores in MemoryInterface

This makes less sense, because many of those fields are now AttackResult fields. So before, the usage might be "get all scores that had this attackID". But now the usage should be "Get the AttackResults with this AttackID" and the scores are attached to those.

Crescendo/TAP Scoring Bug Fix

There was also a pretty bad bug in TAP (and to a lesser extent, Crescendo). When we allowed TrueFalseScorers to be passed in, we lost the float value, so TAP was unable to actually retrieve the "best" conversation (it was either 1 or 0). Also, Crescendo couldn't get that feedback, but it relies on it less. normalize_score_to_float now fixes this by adding the float value to metadata, and TAP now requires a FloatScaleThreshholdScorer so this value is set.

Tests

  • I added tests to check the score bug and that floats are actually used
  • Ran integration tests prior to merging main [e7894d7]

@rlundeen2 rlundeen2 changed the title MAINT FIX: Moving Message metadata to AttackResult MAINT FIX: Moving Message metadata to AttackResult and attack score fix Jan 15, 2026
@rlundeen2 rlundeen2 marked this pull request as draft January 15, 2026 05:55
Comment on lines +281 to +321
def _get_attack_result_metadata(
self,
*,
context: AttackStrategyContextT,
request_converters: Optional[list[Any]] = None,
) -> dict[str, Any]:
"""
Build common metadata fields for AttackResult.
This helper method extracts metadata and consolidates it for per-attack storage.
Args:
context: The attack context containing memory labels and other state.
request_converters: Optional list of PromptConverterConfiguration objects
used in the attack.
Returns:
Dict: A dictionary containing attack_identifier, objective_target_identifier,
request_converter_identifiers, and labels that can be unpacked into
AttackResult constructor.
"""
request_converter_identifiers = None
if request_converters:
# request_converters is a list of PromptConverterConfiguration objects
# Each config has a 'converters' list of actual PromptConverter instances
all_converters = []
for config in request_converters:
if hasattr(config, "converters"):
all_converters.extend(config.converters)
elif hasattr(config, "get_identifier"):
# Direct converter object
all_converters.append(config)
if all_converters:
request_converter_identifiers = [converter.get_identifier() for converter in all_converters]

return {
"attack_identifier": self.get_identifier(),
"objective_target_identifier": self.get_objective_target().get_identifier(),
"request_converter_identifiers": request_converter_identifiers,
"labels": context.memory_labels if context.memory_labels else None,
}
Copy link
Contributor

@bashirpartovi bashirpartovi Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few concerns with this:

  • request_converters: Optional[list[Any]] is too loosely typed. When you use Any, it removes IDE autocompletion/type checking and makes it really unclear what callers should pass.
  • Looking at the usage in the code, request_converters only accepts list[PromptConverterConfiguration] or list[PromptConverter]. We should make sure the signature explicitly uses a union (e.g., Optional[list[PromptConverterConfiguration | PromptConverter]]).
  • Using hasattr here is really not ideal because it can fail silently if you ever rename the attribute. For example, if PromptConverterConfiguration.converters were renamed to convs, this code would still type-check and run but would not behave correctly and would be a hidden bug in the code. If we use Union, it would be safer to branch using isinstance() and handle each type explicitly.

I would push back a bit on adding all the identifier fields directly to AttackResult and removing the dataclass. It bloats the class, loses the dataclass benefits (__repr__, __eq__, field defaults), and turns AttackResult into a catch-all bucket. I think a cleaner pattern is adding a single structured field that groups related identifiers:

@dataclass
class AttackLineage: # naming is my weakness
    objective_target_identifier: dict[str, str]
    request_converter_identifiers: Optional[list[dict[str, str]]] = None
    response_converter_identifiers: Optional[list[dict[str, str]]] = None
    objective_scorer_identifier: Optional[dict[str, str]] = None
    adversarial_chat_target_identifier: Optional[dict[str, str]] = None

Then AttackResult just gets one new field which is lineage: Optional[AttackLineage] = None. This keeps the dataclass, groups related tracking info logically, and is easy to extend later without ever touching AttackResult again.
Here's how the helper methods would look:

def _get_attack_lineage(
    self,
    *,
    request_converters: Optional[list[PromptConverterConfiguration | PromptConverter]] = None,
    response_converters: Optional[list[PromptConverterConfiguration | PromptConverter]] = None,
    objective_scorer_identifier: Optional[dict[str, str]] = None,
    adversarial_chat_target_identifier: Optional[dict[str, str]] = None,
) -> AttackLineage:
    return AttackLineage(
        objective_target_identifier=self.get_objective_target().get_identifier(),
        request_converter_identifiers=self._extract_converter_identifiers(request_converters),
        response_converter_identifiers=self._extract_converter_identifiers(response_converters),
        objective_scorer_identifier=objective_scorer_identifier,
        adversarial_chat_target_identifier=adversarial_chat_target_identifier,
    )


@staticmethod
def _extract_converter_identifiers(
    converters: Optional[list[PromptConverterConfiguration | PromptConverter]],
) -> Optional[list[dict[str, str]]]:

    if not converters:
        return None

    all_converters: list[PromptConverter] = []

    for item in converters:
        if isinstance(item, PromptConverterConfiguration):
            all_converters.extend(item.converters)
        elif isinstance(item, PromptConverter):
            all_converters.append(item)

    if not all_converters:
        return None

    return [converter.get_identifier() for converter in all_converters]

Comment on lines -33 to +64
@dataclass
class AttackResult(StrategyResult):
"""Base class for all attack results."""
"""
Base class for all attack results.
Contains identity information, scoring, metadata moved from per-message storage,
and methods to retrieve conversation history.
"""

def __init__(
self,
*,
conversation_id: str,
objective: str,
attack_identifier: dict[str, str],
targeted_harm_categories: Optional[List[str]] = None,
request_converter_identifiers: Optional[List[Dict[str, str]]] = None,
objective_target_identifier: Optional[Dict[str, str]] = None,
labels: Optional[Dict[str, str]] = None,
automated_objective_score: Optional[Score] = None,
human_objective_score: Optional[Score] = None,
auxiliary_score_ids: Optional[List[str]] = None,
executed_turns: int = 0,
execution_time_ms: int = 0,
outcome: Optional[AttackOutcome] = None,
outcome_reason: Optional[str] = None,
related_conversations: Optional[set[ConversationReference]] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> None:
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned above, we may be able to keep this as a @dataclass and still achieve what you are trying to do here

labels=message_piece.labels,
prompt_target_identifier=message_piece.prompt_target_identifier,
attack_identifier=message_piece.attack_identifier,
attack_identifier=message_piece._attack_identifier,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why _attack_identifier instead of attack_identifier?

category_set: set[str] = set()

for s in scores:
metadata = combine_dict(metadata, getattr(s, "score_metadata", None))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to just do combined_dict(metadaya, s.score_metadata) ? getattr doesn't give you IDE support and if we ever rename the attribute, this will be a bug which is hidden

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants