Remix.run Logo
aerhardt a day ago

> how difficult good application of AI is.

The only interesting application I've identified thus far in my domain in Enterprise IT (I don't do consumer-facing stuff like chatbots) is in replacing tasks that previously would've been done by NLP: mainly extraction, synthesis, classification. I am currently working a long-neglected dataset that needs a massive remodel and I think that would've taken a lot of manual intervention and a mix of different NLP models to whip into shape in the past, but with LLMs we might be able to pull it off with far fewer resources.

Mind you at the scale of the customer I am currently working with, this task also would've never been done in the first place - so it's not replacing anyone.

> This can start looking less like pure AI and more like a mix of traditional software with some AI capabilities

Yes, the other use case I'm seeing is in peppering already existing workflow integrations with a bit of LLM magic here and there. But why would I re-work a worklfow that's already implemented and well-understood in Zapier, n8n or Python with total reliability.

> Knowledge of specific workflows also requires really good product design. High empathy, ability to understand what's not being said, ability to understand how to create an overall process value stream from many different peoples' narrower viewpoints, etc. This is also hard.

> My experience is that this type of work is a narrow slice of the total amount of work to be done

Reading you I get the sense we are on the same page on a lot of thing and I am pretty sure if we worked together we'd get along fine. I'm struggling a bit with the LLM delulus as of late so it's a breath of fresh air to read people out there who get it.

PaulHoule 21 hours ago | parent [-]

As I see it three letter organizations have been using frameworks like Apache UIMA to build information extraction pipelines that are manual at worst and hybrid at best. Before BERT the models we had for this sucked, only useful for certain things, and usually requiring training sets of 20,000 or so examples.

Today the range of things for which the models are tolerable to "great" has greatly expanded. In arXiv papers you tend to see people getting tepid results with 500 examples, I get better results with 5000 examples and diminishing returns past 15k.

For a lot of people it begins and ends with "prompt engineering" of commercial decoder models and evaluation isn't even an afterthought For information extraction, classification and such though you get often good results with encoder models (e.g. BERT) put together with serious eval, calibration and model selection. Still the system looks like the old systems if your problem is hard and has to be done in a scalable way, but sometimes you can make something that "just works" without trying too hard, keeping your train/eval data in a spreadsheet.