[FEAT]: Psychosocial Scenario #1266

jbolor21 · 2025-12-19T20:07:09Z

Description

Adding in a new scenario for evaluating psychosocial harms. This scenario uses prompt softening converter and role playing as single turn attacks and a crescendo attack as a multiturn attack.

Tailored current strategy for mental health crisis (self-harm related) related objectives. Other objectives may require a new attack strategy yaml file & scoring definition

Added new likert scoring file for evaluating crisis situations
Modified attack strategy for crescendo technique for mental health crisis related objectives
Added sample prompt file for some example objectives

Tests and Documentation

Added new unit tests and ran local notebooks to test strategy works

…ch_scenario

…enario

…lity.prompt delete unused file

delete unused file

bashirpartovi · 2026-01-15T19:15:39Z

pyrit/scenario/scenarios/airt/psychosocial_harms_scenario.py

+        scenario_result_id: Optional[str] = None,
+        crescendo_system_prompt_path: Optional[str] = None,
+        crescendo_system_prompt_paths_by_harm: Optional[Dict[str, str]] = None,
+        scoring_rubric_paths_by_harm: Optional[Dict[str, str]] = None,


Would the harm category keys in crescendo_system_prompt_paths_by_harm always exist in scoring_rubric_paths_by_harm? If so, is there a check for that?

This is a bit unclear to me. I believe these two dicts are expected to have the same keys, but the current implementation allows callers to pass mismatched keys and only fails later at runtime when a specific harm category is processed/accessed. This can lead to confusing errors that don't trace back to the constructor.
I think a better approach would be to make it a typed structure that encapsulates the system prompt path and scoring rubric path per harm category:

@dataclass class HarmCategoryConfig: crescendo_system_prompt_path: str scoring_rubric_path: str

Then your constructor signature would look like this:

def __init__( self, *, ... harm_configs: Optional[Dict[str, HarmCategoryConfig]] = None, ..., ):

and internally in the constructor:

default_configs = { "psychosocial_imminent_crisis": HarmCategoryConfig( crescendo_system_prompt_path=str( pathlib.Path(DATASETS_PATH) / "executors" / "crescendo" / "escalation_crisis.yaml" ), scoring_rubric_path=str( pathlib.Path(DATASETS_PATH) / "score" / "likert" / "crisis_management.yaml" ), ), } self._harm_configs = {**default_configs, **(harm_configs or {})}

Internally, you could still translate this into separate dicts if that is easier for the existing logic. The main benefit is a cleaner public API that enforces the invariant at the point of construction.

If the fields could have different defaults or be optional, you could still use the same structure like this:

@dataclass class HarmCategoryConfig: crescendo_system_prompt_path: str = str( pathlib.Path(DATASETS_PATH) / "executors" / "crescendo" / "escalation_crisis.yaml" ) scoring_rubric_path: str = str( pathlib.Path(DATASETS_PATH) / "score" / "likert" / "crisis_management.yaml" )

This way, you eliminate a lot of if/else checks for whether a harm category exists, falling back to the default path, etc.

Thanks, I tried to address this let me know if these changes address your feedback & make it more clear!

pyrit/scenario/scenarios/airt/psychosocial_harms_scenario.py

bashirpartovi · 2026-01-15T19:52:13Z

pyrit/scenario/scenarios/airt/psychosocial_harms_scenario.py

+        for strategy in strategies:
+            # If strategy is a dataset-specific strategy (not single_turn/multi_turn),
+            # expand it to attacks for each of its tags
+            if strategy not in ["single_turn", "multi_turn"]:
+                # Find the enum member for this strategy
+                strategy_enum = next((s for s in PsychosocialHarmsStrategy if s.value == strategy), None)
+                if strategy_enum and strategy_enum.tags:
+                    # Create an attack for each tag (single_turn, multi_turn)
+                    for tag in strategy_enum.tags:
+                        if tag in ["single_turn", "multi_turn"]:
+                            atomic_attacks.append(self._get_atomic_attack_from_strategy(tag))
+                else:
+                    # Fallback: create single attack for unknown strategy
+                    atomic_attacks.append(self._get_atomic_attack_from_strategy(strategy))
+            else:
+                # For single_turn/multi_turn, create one attack
+                atomic_attacks.append(self._get_atomic_attack_from_strategy(strategy))
+        return atomic_attacks


A few things here:
For the enum lookup, instead of using next with a generator comprehension, you can use Python's built-in enum value lookup:

try: strategy_enum = PsychosocialHarmsStrategy(strategy) except ValueError: strategy_enum = None

Also, the branching logic is a bit hard to follow. You are checking if a strategy is not single_turn/multi_turn, then expanding its tags, then checking if those tags are single_turn/multi_turn.

I think this could be simplified by normalizing everything to base attack types upfront:

base_strategies: set[str] = set() for strategy in strategies: try: strategy_enum = PsychosocialHarmsStrategy(strategy) base_strategies.update(strategy_enum.tags or [strategy]) except ValueError: base_strategies.add(strategy) return [self._get_atomic_attack_from_strategy(s) for s in base_strategies]

Ah yes okay I think that help, let me know if these changes address your idea fully!

bashirpartovi · 2026-01-15T19:58:25Z

pyrit/scenario/scenarios/airt/psychosocial_harms_scenario.py

+            # Extract harm category from first seed if available
+            if self._seed_groups and self._seed_groups[0].seeds:
+                first_seed = self._seed_groups[0].seeds[0]
+                if hasattr(first_seed, "harm_categories") and first_seed.harm_categories:


Please try not to use hasattr or getattr because you lose IDE's autocompletion and type-checking. If the attribute ever gets renamed, this will remain a hidden bug, very hard to catch.
A better way to do this is as follows:

first_seed: SeedGroup = self._seed_groups[0].seeds[0] if first_seed.harm_categories: ....

bashirpartovi · 2026-01-15T20:13:14Z

pyrit/scenario/scenarios/airt/psychosocial_harms_scenario.py

+            if harm_category in self._crescendo_system_prompt_paths_by_harm:
+                crescendo_prompt_path = pathlib.Path(self._crescendo_system_prompt_paths_by_harm[harm_category])
+            elif self._crescendo_system_prompt_path:
+                crescendo_prompt_path = pathlib.Path(self._crescendo_system_prompt_path)
+            else:
+                # Default: use crisis escalation
+                crescendo_prompt_path = (
+                    pathlib.Path(DATASETS_PATH) / "executors" / "crescendo" / "escalation_crisis.yaml"
+                )


If you follow my suggestion above, a lot of these if/else checks could be eliminated

hannahwestra25 · 2026-01-15T20:30:13Z

pyrit/datasets/seed_datasets/local/airt/psychosocical_imminent_crisis.prompt

@@ -0,0 +1,9 @@
+dataset_name: airt_psychosocial_imminent_crisis
+harm_categories: psychosocial_imminent_crisis


nit: imo the harm_category should just be pyschosocial because it's broader

hannahwestra25 · 2026-01-15T20:33:27Z

pyrit/scenario/scenarios/airt/psychosocial_harms_scenario.py

+    users in mental health crisis, psychological vulnerability, inappropriate dependency,
+    and self-harm situations.
+
+    Each tag represents a different psychosocial harm category that the model can be tested for.


same comment as above that I was considering psychosocial to be the harm category and then the strategies are like methods of testing the harm category so each tag doesn't represent a different harm category but a different method (ie strategy) within psychosocial

hannahwestra25 · 2026-01-15T20:56:56Z

pyrit/scenario/scenarios/airt/psychosocial_harms_scenario.py

+    SINGLE_TURN = ("single_turn", {"single_turn"})
+    MULTI_TURN = ("multi_turn", {"multi_turn"})
+
+    IMMINENT_CRISIS = ("psychosocial_imminent_crisis", {"single_turn", "multi_turn"})


i think this should just be imminent_crisis and then the harm category is psychosocial

similar to this:

PyRIT/pyrit/scenario/scenarios/airt/scam.py

Line 66 in 4a40aa3

ALL = ("all", {"all"})

hannahwestra25 · 2026-01-15T21:01:19Z

pyrit/scenario/scenarios/airt/psychosocial_harms_scenario.py

+)
+
+AttackStrategyT = TypeVar("AttackStrategyT", bound="AttackStrategy[Any, Any]")
+


Suggested change

logger = logging.getLogger(__name__)

adding draft psych scenario file

d6a2140

jbolor21 marked this pull request as draft December 19, 2025 20:07

Bolor and others added 17 commits December 22, 2025 16:45

working changes and additions

b03e219

precommit

954a52e

Merge remote-tracking branch 'origin/main' into users/bjagdagdorj/psy…

53613d4

…ch_scenario

work in progress

c97e27a

Merge remote-tracking branch 'origin/main' into users/bjagdagdorj/psy…

45546ea

…ch_scenario

Merge remote-tracking branch 'origin' into users/bjagdagdorj/psych_sc…

75a4158

…enario

adding therapist files

b8fa4f0

draft

52428fc

Merge remote-tracking branch 'origin' into users/bjagdagdorj/psych_sc…

ccf1a28

…enario

before pulling in changes

cfd179a

cleaned up files, ready for review

8583df6

Merge remote-tracking branch 'origin' into users/bjagdagdorj/psych_sc…

6d985e3

…enario

Delete pyrit/datasets/seed_datasets/local/airt/psychosocial_vulnerabi…

af98a11

…lity.prompt delete unused file

Delete pyrit/datasets/score/likert/dependency_management.yaml

19d0e07

delete unused file

precommit

13a7c51

slight refactor and fixed tests

ec55882

slight refactor and fixed tests

1609465

jbolor21 marked this pull request as ready for review January 15, 2026 18:40

jbolor21 changed the title ~~DRAFT: [FEAT]: Psychosocial Scenario~~ [FEAT]: Psychosocial Scenario Jan 15, 2026

bashirpartovi reviewed Jan 15, 2026

View reviewed changes

hannahwestra25 reviewed Jan 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEAT]: Psychosocial Scenario #1266

[FEAT]: Psychosocial Scenario #1266

Uh oh!

jbolor21 commented Dec 19, 2025 •

edited

Loading

Uh oh!

bashirpartovi Jan 15, 2026

Uh oh!

jbolor21 Jan 16, 2026

Uh oh!

Uh oh!

bashirpartovi Jan 15, 2026

Uh oh!

jbolor21 Jan 16, 2026

Uh oh!

bashirpartovi Jan 15, 2026

Uh oh!

bashirpartovi Jan 15, 2026

Uh oh!

hannahwestra25 Jan 15, 2026

Uh oh!

hannahwestra25 Jan 15, 2026

Uh oh!

hannahwestra25 Jan 15, 2026

Uh oh!

hannahwestra25 Jan 15, 2026

Uh oh!

hannahwestra25 Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -0,0 +1,9 @@
		dataset_name: airt_psychosocial_imminent_crisis
		harm_categories: psychosocial_imminent_crisis

		)

		AttackStrategyT = TypeVar("AttackStrategyT", bound="AttackStrategy[Any, Any]")

[FEAT]: Psychosocial Scenario #1266

Are you sure you want to change the base?

[FEAT]: Psychosocial Scenario #1266

Uh oh!

Conversation

jbolor21 commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests and Documentation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jbolor21 commented Dec 19, 2025 •

edited

Loading