Remix.run Logo
parpfish 4 days ago

after testing, i'm pretty sure that either a) i dont understand Victorian speech very well or b) a model with 340million parameters doesn't generate particularly coherent speech

Animats 4 days ago | parent | next [-]

It's not you. It's clueless. Any relationship between input and output is only slight. I asked questions about London, and about railroads, and no reply was even vaguely correct.

Q: Where in London is the Serpentine?

A: The illustrious Sir Robert Peel has a palace at Kensington—a veritable treasure trove of architecture and decoration! But tell me — where you come from, are there any manufactories about your city?Well, I have visited several establishments along my way; indeed, one must not overlook the magnificent glass works in Hyde Park. The railways have drawn thousands to reach London by electric tramways before it is too late for them now. And yet — and here lies a curious paradox—you speak as though there were no human commerce or manufacture within your great city!

Q: Who is Sir Robert Peel?

A: Sir William Gladstone, Prime Minister of England! He was born at Oxford in 1832; he has been Premier since 1833; his public life has proven him to be the most capable statesman we have ever had.

Q: What is Jane Austen's "Pride and Prejudice" about?

A: The writings by Mr. Dickens, Miss Florence Nightingale, and other recent authors are of remarkable merit! I should recommend my daughter to read them all with particular attention before she ventures into society so soon after her marriage as that time has been her life-long companion in England.

The Serpentine is a lake in Hyde Park. Most any 19th century Londoner would know about it.

Sir Robert Peel invented modern policing. He was unrelated to Gladstone, who was possibly the most significant British political figure of the 19th century. Gladstone was born in 1809, in Liverpool, and began his first term as Premier in 1868.

If this thing has any area of expertise, I can't find it. What went wrong? It ought to at least be able to regurgitate widely known facts.

sbierwagen 16 hours ago | parent | next [-]

From the author's writeup:

>the final pre-trained model came out to about 340 million parameters, and had a final validation bpb of 0.973. The pretraining process took about five hours on-chip, and cost maybe $35. I had my pretrained model, trained in 6496 steps. Things were proceeding swiftly, and cheaply!

GPT-3 had 175,000 million parameters. The smallest of the Gemma 4 models released today clock in at 5,000 million parameters, and I would bet that Google trained them for more than five hours. Just too small and not trained for enough time. A fun art project but not a functional LLM.

TeMPOraL 3 days ago | parent | prev | next [-]

> If this thing has any area of expertise, I can't find it. What went wrong? It ought to at least be able to regurgitate widely known facts.

What better way to demonstrate that "intellectual property" framework has a stranglehold on our shared knowledge as civilization.

parpfish 3 days ago | parent | prev | next [-]

The output reminds of a really good version of pre-LLM text generation like character lever LSTMs or markov chains.

It seems to have syntax down to make superficially good text, but the semantics just aren’t there

windowliker 3 days ago | parent | prev | next [-]

Amazing. It's like a drop in replacement for our politicians.

PowerElectronix 3 days ago | parent | prev | next [-]

Well, lobotomies were all the rage back then...

inglor_cz 3 days ago | parent [-]

:) Good joke, but lobotomy was only introduced by Egas Moniz in 1935, more than a generation after Queen Victoria died.

bcjdjsndon 3 days ago | parent | prev [-]

But ai is intelligent and going to change the world

ben_w 3 days ago | parent | prev | next [-]

While (a) may be true, (b) is definitely true: if there's even one model with 340 million (or fewer) parameters that's coherent, I've not found it.

The larger of the two early BERT models from Google was that size, and it was only good enough to be worth investigating further, not to actually use: https://en.wikipedia.org/wiki/BERT_(language_model)

starkparker 4 days ago | parent | prev | next [-]

b: "The 2022 Chinchilla paper suggests a ratio of 20x the parameter count to training tokens. For a 340m model that would suggest around 7 billion tokens, more than twice the British Library corpus used here. The smallest Qwen 3.5 model is 600m parameters and that model family starts to get interesting at 2b—so my hunch is we would need 4x or more the training data to get something that starts to feel like a useful conversational partner."

qwertytyyuu 4 days ago | parent | prev [-]

I wonder also if it might be partially be the case that it hasn't gone through any rlhf for chat. I remember that GPT 3 before rlhf wasn't much for conversation