Personally I think foundation models are for the birds, the cost of developing one is immense and the time involved is so great that you can't do many run-break-fix cycles so you will get nowhere on a shoestring. (Though maybe you can get somewhere on simple tasks and synthetic data)

Personally I am working on a reliable model trainer for classification and sequence labeling tasks that uses something like ModernBERT at the front end and some kind of LSTM on the back end.

People who hold court on machine learning forums will swear by fine-tuned BERT and similar things but they are not at all interested in talking about the reliable bit. I've read a lot of arXiv papers where somebody tries to fine-tune a BERT for a classification task, runs some arbitrarily chosen parameters they got out of another paper and it sort-of works some of the time.

It drives me up the wall that you can't use early stopping for BERT fine-tuning like I've been using on neural nets since 1990 or so and if I believe what I'm seeing I don't think the networks I've been using for BERT fine-tuning can really benefit from training sets with more than a few thousand examples, emphasis on the "few".

My assumption is that everybody else is going to be working on the flashy task of developing better foundation models and as long as they emit an embedding-per-token I can plug a better foundation model in and my models will perform better.

▲

mindcrime 5 months ago | parent | next [-]

> Personally I think foundation models are for the birds,

I might not quite that far, but I have publicly said (and will stand by the statement) that I think that training progressively larger and more complex foundation models is a waste of resources. But my view of AI is rooted in a neuro-symbolic approach, with emphasis on the "symbolic". I envision neural networks not as the core essence of an AI, but mainly as just adapters between different representations that are used by different sub-systems. And possibly as "scaffolding" where one can use the "intelligence" baked into an LLM as a bridge to get the overall system to where it can learn, and then eventually kick the scaffold down once it isn't needed anymore.

▲

tlb 5 months ago | parent | next [-]

We learned something pretty big and surprising from each new generation of LLM, for a small fraction of the time and cost of a new particle accelerator or space telescope. Compared to other big science projects, they're giving pretty good bang for the buck.

▲

PaulHoule 5 months ago | parent | prev | next [-]

I can sure talk your ear off about that one as I went way too far into the semantic web rabbit hole.

Training LLMs to use 'tools' of various types is a great idea, as it is to run them inside frameworks that check that their output satisfies various constraints. Still certain problems like the NP-complete nature of SAT solving (and many intelligent systems problems, such as word problems you'd expect an A.I. to solve, boil down to SAT solving) and problems such as the halting problem, Godel's theorem and such are still problems. I understand Doug Hofstader has softened his positions lately, but I think many of the problems set up in this book

https://en.wikipedia.org/wiki/G%C3%B6del,_Escher,_Bach

(particularly the Achilles & Tortoise dialog) still stand today, as cringey as that book seems to me in 2025.

▲

throwawaymaths 5 months ago | parent | next [-]

i am hoping for an slm "turing tape" small language model where the tokens are instructions for a copycat engine

▲

mindcrime 5 months ago | parent | prev [-]

As somebody who considers himself something of a Semantic Web enthusiast / advocate, and has also read GEB, I can totally relate. To me, this is really one of those "THE ISSUE" things: how can we use some notion of formal logic to solve problems, without being forced to give up hope due to incompleteness and/or the Halting Problem. Clearly you have to give up something as a tradeoff for making this stuff tractable, but I suppose it's an open question what you can tradeoff and how exactly that factors into the algorithm, as well as what guarantees (if any) remain...

	▲	PaulHoule 5 months ago \| parent [-]
		I would start with the fact that there is nothing consistent or complete about humans. Penrose's argument that he is a thetan because he can do math doesn't hold water.

▲

5 months ago | parent | prev | next [-]

[deleted]

▲

dr_dshiv 5 months ago | parent | prev [-]

Good old fashioned AI, amirite

▲

mindcrime 5 months ago | parent [-]

Well, to the extent that people equate GOFAI with purely symbolic / logic-based processing, then no, not for my money anyway. I think it's possible to construct systems that use elements of symbolic processing along with sub-symbolic approaches and get useful results. I think of it as (although this is something of an over-simplification) taking symbolic reasoning, relaxing some of the constraints that go along with the guarantees that method makes out the outputs, and accepting a (hopefully only slightly) less desirable output. OR, think about flipping the whole thing around, get an output from, say, an LLM where there might be hallucination(s), and then use a symbolic reasoning system to post-process the output to ensure veracity before sending it to the user. Amazon has done some work along those lines, for example. https://aws.amazon.com/blogs/machine-learning/reducing-hallu...

Anyway this is all somewhat speculative, and I don't want to overstate the "weight" of anything I seem to be claiming here. This is just the direction my interests and inclinations have taken me in.

▲

dr_dshiv 5 months ago | parent | next [-]

Maybe gen AI coding is neurosymbolic AI, realized differently than expected

	▲	mindcrime 5 months ago \| parent [-]
		Never say never! I can't rule it out, for sure. :-)

▲

Xcelerate 5 months ago | parent | prev [-]

I’ve never liked that term “sub-symbolic”. It implies that there is something at a deeper level than what a Turing machine can compute (i.e., via the manipulation of strings of symbols), and as far as we can tell, there’s no evidence for that. It might be true, but even a quantum computer can be simulated on a classical computer. And of course neural networks run on classic computers too.

Yeah, I know that’s not what “symbol” is really referring to here in this context but I just don’t like what the semantics of the word suggests about neural networks — that they are somehow a halting oracle or hypercomputation — which they’re obviously not.

▲

dr_dshiv 5 months ago | parent | next [-]

Read Paul Smolensky’s paper on the harmonium. First restricted Boltzmann machine. The beginning helps justify subsymbolic in a pretty beautiful way.

▲

mindcrime 5 months ago | parent | prev [-]

It's not the name I would have chosen either (probably) but I wasn't around when those decisions were being made and nobody asked me for my opinion. So I just roll with it. What can ya do?

	▲	Xcelerate 5 months ago \| parent [-]
		Oh for sure! Wasn’t critiquing your comment at all. I’ve seen the term a lot lately and it just made me wonder how much the industry is using it as a misleading hype factor. E.g., LLMs are “better” than Turing machines because they are operating at a level “below” Turing machines even though the comparison doesn’t make sense, as symbolic computation isn’t referring to the symbol-manipulating nature of Turing machines in the first place.

▲

NewUser76312 5 months ago | parent | prev [-]

Yeah I've been wondering how one can contribute and build in the LLM and AI world without the resources to work on foundation models.

Because personally I'm not a product/GPT wrapper person - it just doesn't suit my interests.

So then what can one do that's meaningful and valuable? Probably something around finetuning?