| ▲ | CjHuber 3 hours ago | |
My point was not that those video to text models are good like they are used for example in that case, but more generally I was referring to that list of indicators. Like surely when analysing a movie it is alright if some things are misunderstood by it, especially as the amount of misunderstanding can be decreased a lot. That AI body camera surely is optimized on speed and inference cost. but if you give an agent 10 1s images along with the transcript of that period and the full prior transcript, and give it reasoning capabilities, it would take almost endlessy for that movie to process but the result surely will be much better than the body cameras. After all the indicator talks about "AI" in general so judge a model not optimized for capability but something else to measure on that indicator | ||