|
| ▲ | sdenton4 an hour ago | parent | next [-] |
| It's the statistics equivalent of 'no one needs more than 640kb of RAM' |
|
| ▲ | voxelghost 3 hours ago | parent | prev | next [-] |
| After a quick content browse, my understanding is this is more like with a very compressed diff vector, applied to a multi billion parameter model, the models could be 'retrained' to reason (score) better on a specific topic , e.g. math was used in the paper |
|
| ▲ | ekuck 3 hours ago | parent | prev | next [-] |
| speak for yourself! |
|
| ▲ | est 3 hours ago | parent | prev | next [-] |
| reasoning capability might just be some specific combinations of mirror neurons. even some advanced math usually evolves applying patterns found elsewhere into new topics |
|
| ▲ | measurablefunc 3 hours ago | parent | prev [-] |
| I agree, I don't think gradient descent is going to work in the long run for the kind of luxurious & automated communist utopia the technocrats are promising everyone. |