| ▲ | rcanand2025 3 hours ago | |
I'm working on a dashboard for ranking llms, then finding the best local (by size) and/or hosted (by price) variants of the models. Currently have ArtificialAnalysis leaderboard for ranking, ollama registry for local models and openrouter for hosted models. https://ollamadash.up.railway.app By default, home page gives all models in the leaderboard, local and hosted. Search for models in the search box on the home page to find the top models by ranking, local(by size) and hosted (by price). You can also do deep querying/sorting/searching filters of models in each of these three nodes (see the other tabs on top). The next steps I am working on (would love feedback on this or anything else): Phase 1: - Change clicks on home page model tiles in one column to search and show models filtered by that across Artificial Analysis, Ollama, OpenRouter - User specifies their system VRAM (unified/dedicated) and we automatically filter the home page with models that would fit on that RAM - in the three columns. - User specifies their price range (per MTok, max across input and output), and we similarly filter and rank by those models across all columns. - User specifies both (VRAM and price range), and we filter by both - leaderboard is union of local and hosted, local by VRAM and hosted by price range match. Phase 2: Once I have this working, add a local desktop client that automatically reads user system and infers VRAM, renders app as webview. Considering pyside6 with Qt for this. Phase 3: On desktop client, user can download and chat with the local models automatically based on leaderboard, optionally call hosted models, etc. Used primarily to evaluate and compare local vs hosted models for user's use cases. Also have some interesting alternate experiences to host within the local private app for user to interact with llms, agents, etc. Do let me know whether this seems useful, or how I can make it more useful. | ||
| ▲ | iugtmkbdfil834 3 hours ago | parent [-] | |
Kudos for trying and I think it is a great start. Part of the issue is still that individual models differ greatly ( especially local ones ) in terms of what they can do ( and do well ). The problem is that you want some more custom tags ( ideally created by users who want to contribute to tag's accuracy ) 'can it generate csv', 'can it follow schema', 'can it offer position on $conversy_Z'.. none of these will be obvious, but will relate to real use cases. We go back to the question of 'what does best actually mean'. | ||