Just one run per model? That isn't backtesting. I mean technically it is, but "testing" implies producing meaningful measures.

Also just one time interval? Something as trivial as "buy AI" could do well in one interval, and given models are going to be pumped about AI, ...

100 independent runs on each model over 10 very different market behavior time intervals would producing meaningful results. Like actually credible, meaningful means and standard deviations.

This experiment, as is, is a very expensive unbalanced uncharacterizable random number generator.

▲

cheeseblubber 19 hours ago | parent | next [-]

Yes definitely we were using our own budget and out of our own pocket and these model runs were getting expensive. Claude costed us around 200-300 dollars a 8 month run for example. We want to scale it and get more statistically significant results but wanted to share something in the interim.

	▲	Nevermark 18 hours ago \| parent [-]
		Got it. It is an interesting thing to explore.

▲

energy123 17 hours ago | parent | prev | next [-]

To their credit, they say in the article that the results aren't statistically significant. It would be better if that disclaimer was more prominently displayed though.

The tone of the article is focused on the results when it should be "we know the results are garbage noise, but here is an interesting idea".

▲

zer0tonin 5 hours ago | parent | prev | next [-]

Not only just one run per model, but no metrics other than total return. If you pick stocks at random you have a very high chance of beating the S&P 500, so you need a bit more than that to make a good benchmark.

▲

Marsymars 15 hours ago | parent | prev | next [-]

To take it to the absurdist conclusion, you could backtest each LLM "which single stock should I buy on Jan 1, 2010 to maximize my returns over the next 15 years?"

If your backtested LLM performed well, would you use the same strategy for the next 15 years? (I suppose there are people who would.)

▲

hhutw 17 hours ago | parent | prev | next [-]

Yeah...one run per model is just random walk in my opinion

▲

ipnon 18 hours ago | parent | prev [-]

Yes, if these models available for $200/month a making 50% returns reliably, why isn’t Citadel having layoffs?

	▲	lisbbb 16 hours ago \| parent [-]
		In my experience, you get a few big winners, but since you have to keep placing new trades (e.g. bets) you eventually blow one and lose most of what you made. This is particularly true with options and futures trades. It's a stupid way to speculate with or without AI help doesn't matter and will never matter.