Upload and explore model predictions from SimBench evaluations. Compare LLM response distributions against ground truth human behaviors.
Loading results...