Remix.run Logo
culi a day ago

Kimi K2 is the model that most consistently passes the clock test. I agree it's definitely got something unique going on

https://clocks.brianmoore.com/

davej a day ago | parent | next [-]

Nice! I'm curious, what does this service cost to run? I notice that you don't have more expensive models like Opus but querying the models every minute must add up over time (excuse pun)?

culi a day ago | parent [-]

(not my project)

eunos a day ago | parent | prev [-]

Lol why's GPT 5 broken on that test. DeepSeek surprisingly crisp and robust