| ▲ | teleforce 9 hours ago | |
>Why domain specific LLMs won’t exist: an intuition >We would have a healthcare model, economics model, mathematics model, coding model and so on. It's not the question whether there ever will be specialized model, rather it's the matter of when. This will democratize almost all work and profession, including programmers, architects, lawyers, engineers, medical doctors, etc. For half-empty glass people, they will say this is a catastrophe of machine replacing human. On the other hand, the half-full glass people will say this is good for society and humanity by making the work more efficient, faster and at a much lower cost. Imagine instead of having to wait for a few months for your CVD diagnostic procedures due to the lack of cardiologist around the world (facts), the diagnostics with the help of AI/LLM will probably takes only a few days instead with expert cardiologist in-the-loop, provided the sensitivity is high enough. It's a win-win situation for patients, medical doctors and hospitals. This will lead to early detection of CVDs, hence less complication and suffering whether it's acute or chronic CVDs. The foundation models are generic by nature with clusters HPC with GPU/TPU inside AI data-center for model training. The other extreme is RAG with vector databases and file-system for context prompting as the sibling's comments mentioned. The best trade-off or Goldilocks is the model fine-tuning. To be specific it's the promising self-distillation fine-tuning (SDFT) as recently proposed by MIT and ETH Zurich [1],[2]. Instead of the disadvantages of forgetting nature of the conventional supervised fine-tuning (SFT), thr SDFT is not forgetful that makes fine-tuning practical and not wasteful. The SDFT only used 4 x H200 GPU for fine-tuning process. Apple is also reporting the same with their simple Smself-distillation (SSD) for LLM coding specialization [3],[4]. They used 8 x B200 GPU for model fine-tuning, which any company can afford for local fine-tuning based on open weight LLM models available from Google, Meta, Nvidia, OpenAI, DeepSeek, etc. [1] Self-Distillation Enables Continual Learning: https://arxiv.org/abs/2601.19897 [2] Self-Distillation Enables Continual Learning: https://self-distillation.github.io/SDFT.html [3] Embarrassingly simple self-distillation improves code generation: https://arxiv.org/abs/2604.01193 [4] Embarrassingly simple self-distillation improves code generation (185 comments): | ||