▲ | rsanheim 3 days ago | |
`ETOOMANYMODELS` Is there a reputable, non-blogspam site that offers a 'cheat sheet' of sorts for what models to use, in particular for development? Not just openAI, but across the main cloud offerings and feasible local models? I know there are the benchmarks, and directories like huggingface, and you can get a 'feel' for things by scanning threads here or other forums. I'm thinking more of something that provides use-case tailored "top 3" choices by collecting and summarizing different data points. For example: * agent & tool based dev (cloud) - [top 3 models] * agent & tool based dev (local) - m1, m2, m,3 * code review / high level analysis - ... * general tech questions - ... * technical writing (ADRs, needs assessments, etc) - ... Part of the problem is how quickly the landscape changes everyday, and also just relying on benchmarks isn't enough: it ignores cost, and more importantly ignores actual user experience (which I realize is incredibly hard to aggregate & quantify). | ||
▲ | departed 3 days ago | parent | next [-] | |
LMArena might have some of the information you are looking for. It offers rankings of LLM models across main cloud offerings, and I feel that its evaluation method, human prompting and voting, is closer to real-world use case and less prone to data contamination than benchmarks. In the "Leaderboard">"Language" tab, it lists the top models in various categories such as overall, coding, math, and creative writing. In the "Leaderboard">"Price Analysis" tab, it shows a chart comparing models by cost per million tokens. In the "Prompt-to-Leaderboard" tab, there is even an LLM to help you find LLMs -- you enter a prompt, and it will find the top models for your particular prompt. | ||
▲ | ac29 3 days ago | parent | prev | next [-] | |
> Is there a reputable, non-blogspam site that offers a 'cheat sheet' of sorts for what models to use, in particular for development? Below is a spreadsheet I bookmarked from a previous HN discussion. Its information dense but you can just look at the composite scores to get a quick idea how things compare. https://docs.google.com/spreadsheets/u/1/d/1foc98Jtbi0-GUsNy... | ||
▲ | Carbonhell 3 days ago | parent | prev [-] | |
I have been using this site: https://artificialanalysis.ai/ . It's still about benchmarks, and it doesn't do deep dives into specific use cases, but it's helpful to compare models for intelligence vs cost vs latency and other characteristics. |