Kimi K2 is the model that most consistently passes the clock test. I agree it's definitely got something unique going on

https://clocks.brianmoore.com/

▲

davej a day ago | parent | next [-]

Nice! I'm curious, what does this service cost to run? I notice that you don't have more expensive models like Opus but querying the models every minute must add up over time (excuse pun)?

	▲	culi a day ago \| parent [-]
		(not my project)

▲

eunos a day ago | parent | prev [-]

Lol why's GPT 5 broken on that test. DeepSeek surprisingly crisp and robust