▲ | rolisz 6 days ago | |
> I'd also pay for something in the 10-15b parameter range that used more limited training data focused almost entirely on programming documentation and books along with professional business writing. Unfortunately, pretraining on a lot of data (~everything they can get their hands on) is needed to give current LLMs their "intelligence" (for whatever definition of intelligence). Using less training data doesn't work as well for now. There definitely not enough programming and business writing to train a good model only on that. | ||
▲ | hajile 5 days ago | parent [-] | |
If the LLM isn’t getting its data about coding projects from those projects and their surrounding documentation and tutorials, what is it going to train with? Maybe it also needs some amount of other training data for basic speech patterns, but I’d again show IBM Granite as an example that professional and to-the-point LLMs are possible. |