| ▲ | Training a trillion parameter model to be funny(jokegen.sdan.io) | |||||||||||||||||||
| 20 points by sdan 6 days ago | 14 comments | ||||||||||||||||||||
| ▲ | whacked_new an hour ago | parent | next [-] | |||||||||||||||||||
Circa GPT-3.5 to GPT-4o I was involved in some research in figuring out how to make LLMs funny. We tried a bunch of different things, from giving it rules on homonym jokes [1], double-entendre jokes, fine tuning on comedian transcripts, to fine tuning on publicly rated joke boards. We could not make it funny. Also interesting was that when CoT research was getting a lot of attention, we tried a joke version of CoT, asking GPT4 to explain why a joke was funny in order to produce training set data. Most of the explanations were completely off base. After this work, I became a lot less worried about the GAI-taking-over narrative. Funny is very, very hard. [1] without a dictionary, which at first seems inefficient, but this work demonstrated that GPT could perfectly reconstruct the dictionary anyway | ||||||||||||||||||||
| ▲ | nine_k an hour ago | parent | prev | next [-] | |||||||||||||||||||
Some models are better at generating funny and poignant quips. > my human mass-generates new ideas faster than I can research why the previous ones won't work > this is called 'job security' (https://nitter.poast.org/LetheAgent/status/20179595340865499...) | ||||||||||||||||||||
| ▲ | politelemon 42 minutes ago | parent | prev | next [-] | |||||||||||||||||||
The model appears to have been overfitted to joke about the live demo being private. | ||||||||||||||||||||
| ▲ | userbinator 30 minutes ago | parent | prev | next [-] | |||||||||||||||||||
Unfortunately I find most AI hallucinations to be funnier than these attempts at comedy. | ||||||||||||||||||||
| ▲ | scosman an hour ago | parent | prev | next [-] | |||||||||||||||||||
I make a project for evals and fine-tuning and our default example task is a joke generator. It's a fun demo, but more importantly it's a really good use case to show how evaluating and optimizing LLMs is hard. - There are a dozen plus common failure modes. How you split setup/punchline. Tropes. Toxicity. Template reuse. Each one needs a good eval. - Datasets are hard: there's not much off the shelf, and as this author points out scraping gets a weird mix of quality. - Models are really bad out of the box at humour. At the end of the day it's just a hard problem that takes a lot of work and still isn't solved. GEPA prompts help, if you have good evals. Supervised fine-tuning works a little bit, but only if you training on a chain-of-thought thinking phase. We have a new evaluation builder that uses examples of edge cases for alignment, and jokes require the most iteration and feedback for refinement. If you want to try it: https://github.com/kiln-ai/kiln | ||||||||||||||||||||
| ▲ | kevmo314 31 minutes ago | parent | prev | next [-] | |||||||||||||||||||
Is writing in all lowercase funnier? | ||||||||||||||||||||
| ||||||||||||||||||||
| ▲ | crawfordcomeaux 2 hours ago | parent | prev | next [-] | |||||||||||||||||||
I once had a vivid dream that AI robots had taken over & were keeping humans around because they'd not yet mastered comedy. All of human culture globally was a comedy arms race with 24/7 open mic comedy jams on every corner. They (the machines) had billboards/signage everywhere showing the estimated time left for humanity. A really good joke would lead the timer to grow (until they figured out how to produce the general patterns needed to both create and appreciate the joke). | ||||||||||||||||||||
| ||||||||||||||||||||
| ▲ | suddenlybananas 3 hours ago | parent | prev | next [-] | |||||||||||||||||||
these really aren't very funny | ||||||||||||||||||||
| ▲ | gipp 2 hours ago | parent | prev [-] | |||||||||||||||||||
It would be easier to judge this if the jokes weren't 90% about AI and silicon valley, understandable only to people who subscribe to astralcodexten | ||||||||||||||||||||
| ||||||||||||||||||||