SimBench Leaderboard

Overall simulation ability measured by SimBench score averaged across the two main splits

45
Total Models
40.80
Top Score
12.7
Average Score
Rank Model Type Release Score (S ↑)

Note: Reasoning models are highlighted in italics.

Baseline: Models below the dotted line perform worse than a uniform baseline.

Score Range: SimBench scores range from -∞ to 100, with higher scores indicating better simulation ability.