SimBench Leaderboard

Overall simulation ability measured by SimBench score averaged across the two main splits

Total Models

40.80

Top Score

12.7

Average Score

Rank	Model	Type	Release	Score (S ↑)

Note: Reasoning models are highlighted in italics.

Baseline: Models below the dotted line perform worse than a uniform baseline.

Score Range: SimBench scores range from -∞ to 100, with higher scores indicating better simulation ability.