▲ | voiper1 a day ago | |
The benchmarks in many ways seem to be very similar to claude 3.7 for most cases. That's nowhere near enough reason to think we've hit a plateau - the pace has been super fast, give it a few more months to call that...! I think the opposite about the features - they aren't gimmicks at all, but indeed they aren't part of the core AI. Rather it's important "tooling" that adjacent to the AI that we need to actually leverage it. The LLM field in popular usage is still in it's infancy. If the models don't improve (but I expect they will), we have a TON of room with these features and how we interact, feed them information, tool calls, etc to greatly improve usability and capability. |