Training a trillion parameter model to be funny

▲ Training a trillion parameter model to be funny(jokegen.sdan.io)

20 points by sdan 6 days ago | 14 comments

▲ whacked_new an hour ago | parent | next [-]

Circa GPT-3.5 to GPT-4o I was involved in some research in figuring out how to make LLMs funny. We tried a bunch of different things, from giving it rules on homonym jokes [1], double-entendre jokes, fine tuning on comedian transcripts, to fine tuning on publicly rated joke boards.

We could not make it funny. Also interesting was that when CoT research was getting a lot of attention, we tried a joke version of CoT, asking GPT4 to explain why a joke was funny in order to produce training set data. Most of the explanations were completely off base.

After this work, I became a lot less worried about the GAI-taking-over narrative.

Funny is very, very hard.

[1] without a dictionary, which at first seems inefficient, but this work demonstrated that GPT could perfectly reconstruct the dictionary anyway

▲ nine_k an hour ago | parent | prev | next [-]

Some models are better at generating funny and poignant quips.

> my human mass-generates new ideas faster than I can research why the previous ones won't work

> this is called 'job security'

(https://nitter.poast.org/LetheAgent/status/20179595340865499...)

▲ politelemon 42 minutes ago | parent | prev | next [-]

The model appears to have been overfitted to joke about the live demo being private.

▲ userbinator 30 minutes ago | parent | prev | next [-]

Unfortunately I find most AI hallucinations to be funnier than these attempts at comedy.

▲ scosman an hour ago | parent | prev | next [-]

I make a project for evals and fine-tuning and our default example task is a joke generator. It's a fun demo, but more importantly it's a really good use case to show how evaluating and optimizing LLMs is hard.

- There are a dozen plus common failure modes. How you split setup/punchline. Tropes. Toxicity. Template reuse. Each one needs a good eval.

- Datasets are hard: there's not much off the shelf, and as this author points out scraping gets a weird mix of quality.

- Models are really bad out of the box at humour.

At the end of the day it's just a hard problem that takes a lot of work and still isn't solved. GEPA prompts help, if you have good evals. Supervised fine-tuning works a little bit, but only if you training on a chain-of-thought thinking phase. We have a new evaluation builder that uses examples of edge cases for alignment, and jokes require the most iteration and feedback for refinement.

If you want to try it: https://github.com/kiln-ai/kiln

▲ kevmo314 31 minutes ago | parent | prev | next [-]

Is writing in all lowercase funnier?

	▲	DanHulton 9 minutes ago \| parent [-]
		...this is actually a really interesting thought. The act of writing in lowercase is not, in itself, funnier. But writing in the training set that is in all lowercase is _probably_ going to be the funnier writing. Considering modern pundits online, "lowercase" is usually the case of the humourist. Lowercase also tends to be the case of sarcasm, almost exclusively deployed to be funny. So it would make sense that models attempting to select for funny would also write in lowercase.

▲ crawfordcomeaux 2 hours ago | parent | prev | next [-]

I once had a vivid dream that AI robots had taken over & were keeping humans around because they'd not yet mastered comedy. All of human culture globally was a comedy arms race with 24/7 open mic comedy jams on every corner.

They (the machines) had billboards/signage everywhere showing the estimated time left for humanity. A really good joke would lead the timer to grow (until they figured out how to produce the general patterns needed to both create and appreciate the joke).

	▲	colecut an hour ago \| parent [-]
		openclaw, turn this into a broadway production, book me two front row seats, hire an escort..... brunette, 28, slim waist, sweet face, hates comedy and AI

▲ suddenlybananas 3 hours ago | parent | prev | next [-]

these really aren't very funny

▲ gipp 2 hours ago | parent | prev [-]

It would be easier to judge this if the jokes weren't 90% about AI and silicon valley, understandable only to people who subscribe to astralcodexten

	▲	emp17344 2 hours ago \| parent \| next [-]
		Probably because if they weren’t absurdly esoteric we’d be able to tell it isn’t funny.
	▲	omoikane 2 hours ago \| parent \| prev \| next [-]
		I thought this one was not bad: `[write a joke about thinking machines and the idea of tropes] it's funny how enemies to lovers is a common trope that's uncommon in real life and lovers to enemies is an uncommon trope that's common in real life`
	▲	simonw 2 hours ago \| parent \| prev [-]
		Who's tommipink? Even a Google search couldn't explain that one.