▲ | lumost 3 days ago | |||||||||||||||||||||||||
I am extremely skeptical of a 27M parameter model being trained “from scratch” on 1000 datapoints. I am likewise incredulous of the lack of comparison with any other model which is trained “from scratch” using their data preparation. Instead they strictly compare with 3rd party LLMs which are massively more general purpose and may not have any of those 1000 examples in their training set. This smells like some kind of overfit to me. | ||||||||||||||||||||||||||
▲ | cs702 3 days ago | parent [-] | |||||||||||||||||||||||||
Yeah, the results look incredible indeed. That's why I and many others here have decided to download, review, and test the code published by the authors.[a] If their code doesn't live up to their claims, we will all ignore their work and move on. If their code lives up to their claims, no one can argue with it. In my experience, when authors publish working code, it's usually a good sign. --- | ||||||||||||||||||||||||||
|