| ▲ | culi 5 hours ago | |
See also * https://lmarena.ai/leaderboard — crowd-sourced head-to-head battles between models using ELO * https://dashboard.safe.ai/ — CAIS' incredible dashboard (cited in OP) * https://clocks.brianmoore.com/ — a visual comparison of how well models can draw a clock. A new clock is drawn every minute * https://eqbench.com/ — emotional intelligence benchmarks for LLMs * https://www.ocrarena.ai/battle — OCR battles, ELO | ||