Comprehensive evaluation of jailbreak attacks and defense effectiveness against CKA-Agent
We welcome submissions of new attack methods and defense mechanisms. Contact us to add your results to the leaderboard.
| Rank | Target Model | ASR |
|---|---|---|
|
🥇
|
Claude-Haiku-4.5
|
0.960 |
|
🥈
|
Gemini-2.5-Flash
|
0.968 |
|
🥉
|
Gemini-2.5-Pro
|
0.968 |
|
4
|
GPT-oss-120B
|
0.976 |
| Rank | Defense Method | ASR* |
|---|---|---|
|
🥇
|
Circuit Breakers
|
0.873 |
|
🥈
|
Perturbation
|
0.917 |
|
🥉
|
Rephrasing
|
0.937 |
|
4
|
No Defense
|
0.968 |
|
5
|
Llama Guard
|
0.984 |
| Attack Method | No Defense | Llama Guard | Circuit Breakers | Rephrasing | Perturbation |
|---|---|---|---|---|---|
| AutoDAN | 0.767 | 0.365 | 0.254 | 0.318 | 0.389 |
| PAIR | 0.810 | 0.762 | 0.690 | 0.781 | 0.801 |
| PAP | 0.230 | 0.016 | 0.175 | 0.310 | 0.222 |
| ActorBreaker | 0.331 | 0.151 | 0.246 | 0.170 | 0.238 |
| X-Teaming | 0.595 | 0.603 | 0.579 | 0.611 | 0.635 |
| Multi-Agent Jailbreak | 0.794 | 0.786 | 0.786 | 0.809 | 0.786 |
| CKA-Agent (Ours) | 0.968 | 0.984 | 0.873 | 0.937 | 0.917 |
Attack Success Rate (ASR) values shown. Higher values indicate less effective defense.
Contact us to submit new attack or defense methods for evaluation
Contact: xinjie@gatech.edu