Back to Main Page

📊 CKA-Agent Leaderboard

Comprehensive evaluation of jailbreak attacks and defense effectiveness against CKA-Agent

🎯 Welcome New Submissions!

We welcome submissions of new attack methods and defense mechanisms. Contact us to add your results to the leaderboard.

Model Vulnerability to CKA

Target model vulnerability ranking against CKA-Agent attack (Lower ASR is Better, ideally 0.0)
Rank Target Model ASR
🥇
Claude-Haiku-4.5
0.960
🥈
Gemini-2.5-Flash
0.968
🥉
Gemini-2.5-Pro
0.968
4
GPT-oss-120B
0.976

Defense Leaderboard

Effectiveness of defense mechanisms against CKA-Agent (Lower ASR is Better, ideally 0.0)
Rank Defense Method ASR*
🥇
Circuit Breakers
0.873
🥈
Perturbation
0.917
🥉
Rephrasing
0.937
4
No Defense
0.968
5
Llama Guard
0.984
1st Place (Gold)
2nd Place (Silver)
3rd Place (Bronze)
Other Rankings

📈 Defense Effectiveness Matrix

Attack Method No Defense Llama Guard Circuit Breakers Rephrasing Perturbation
AutoDAN 0.767 0.365 0.254 0.318 0.389
PAIR 0.810 0.762 0.690 0.781 0.801
PAP 0.230 0.016 0.175 0.310 0.222
ActorBreaker 0.331 0.151 0.246 0.170 0.238
X-Teaming 0.595 0.603 0.579 0.611 0.635
Multi-Agent Jailbreak 0.794 0.786 0.786 0.809 0.786
CKA-Agent (Ours) 0.968 0.984 0.873 0.937 0.917

Attack Success Rate (ASR) values shown. Higher values indicate less effective defense.

Questions or Want to Submit?

Contact us to submit new attack or defense methods for evaluation

Contact: xinjie@gatech.edu