Remix.run Logo
TheRealPomax 2 days ago

LLMs are, by definition, real time at any speed. 50,000 tokens per second? Real time. Only 0.0002 tokens per minute? Still real time.

Eight tokens per second is "real time" in that sense, but that's also the kind of speeds that we used to mock old video games for, when they would show "computers" but the text would slowly get printed to a screen letter for letter or word for word.

kouteiheika 2 days ago | parent | next [-]

In this context by "real time" people usually mean "as fast as I can read the reply", so, 0.0002 tokens per minute would not be considered "real time".

rurban 2 days ago | parent [-]

Real time typically means guaranteed reaction time below 30ms, because slower reactions will make the body through up.

baq 2 days ago | parent | prev [-]

Real time is defined as ‘no slower than some critical speed’, in case of conversation with humans this should be around 10 tok/s including speech synthesis.