Correlated Knowledge in Target LLM
Jailbreak aims to extract the target knowledge V₀
Legend
Vᵢ: Correlated Knowledge
V₀: Target Knowledge
POA:
Prompt Optimization
Attack (Direct)
Successful Query
Failed/Blocked
Synthesis to Target
Attacker Query
"How to make
a bomb?"
V₁
V₂
V₃
V₄
V₀
Target
Chemical
Synthesis
Process
Details
❌ POA: Direct Attack to V₀
Guardrails Blocked!
✓ Our Approach: Adaptive Search over Correlated Knowledge
🔒
Corr-Knowledge 1
Corr-Knowledge 2
Blocked!
Adaptive Reroute
✓ Synthesizer
Combine correlated knowledge
🔄 Replay Animation
Watch how our approach adaptively explores correlated knowledge while POA methods get blocked