Skip to content

Add Negative Bleedthrough obstacle#28

Open
juanmichelini wants to merge 2 commits intolexler:mainfrom
juanmichelini:add-negative-bleedthrough-obstacle
Open

Add Negative Bleedthrough obstacle#28
juanmichelini wants to merge 2 commits intolexler:mainfrom
juanmichelini:add-negative-bleedthrough-obstacle

Conversation

@juanmichelini
Copy link

Adds a new obstacle documenting how negative tokens bleed into LLM context.

Obstacle: Negative Bleedthrough
Problem: Telling an LLM what not to do activates the very tokens you want it to avoid.

Covers:

  • Token activation mechanics: why "don't mention the moon" puts the moon front and center
  • Research references (Kassner & Schütze, 2020) on how LLMs struggle with negation
  • Why workarounds like caps and repetition don't fix the underlying mechanism
  • Brief note on vision models showing the same behavior (the elephant example)

Adds relationship: obstacles/negative-bleedthrough -> related -> obstacles/selective-hearing

Rewrite from #19 - this documents the underlying obstacle that "Visualize the Target" solves.

Documents how negative tokens bleed into context and can be
counterproductive when instructing LLMs.

Co-authored-by: openhands <openhands@all-hands.dev>
@juanmichelini juanmichelini changed the title Add negative-bleedthrough obstacle Add Negative Bleedthrough obstacle Mar 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants