| ▲ | sipjca 2 hours ago | |
Wondering similar. It certainly can run beyond 30 seconds but at some point I believe the output should degrade Plus you could do actual batch inference instead. Or if you must carry forward the context you could still do it linearly, but the mem usage shouldn’t just explode | ||