> We were cautious to only run after each model’s training cutoff dates for the LLM models. That way we could be sure models couldn’t have memorized market outcomes.

▲

stusmall 19 hours ago | parent | next [-]

Even if it is after the cut off date wouldn't the models be able to query external sources to get data that could positively impact them? If the returns were smaller I could reasonably believe it but beating the S&P500 returns by 4x+ strains credulity.

▲

cheeseblubber 19 hours ago | parent [-]

We used the LLMs API and provided custom tools like a stock ticker tool that only gave stock price information for that date of backtest for the model. We did this for news apis, technical indicator apis etc. It took quite a long time to make sure that there weren't any data leakage. The whole process took us about a month or two to build out.

	▲	alchemist1e9 18 hours ago \| parent [-]
		I have a hunch Grok model cutoff is not accurate and somehow it has updated weights though they still call it the same Grok model as the params and size are unchanged but they are incrementally training it in the background. Of course I don’t know this but it’s what I would do in their situation since ongoing incremental training could he a neat trick to improve their ongoing results against competitors, even if marginal. I also wouldn’t trust the models to honestly disclose their decision process either. That said. This is a fascinating area of research and I do think LLM driven fundamental investing and trading has a future.

▲

plufz 19 hours ago | parent | prev [-]

I know very little about how the environment where they run these models look, but surely they have access to different tools like vector embeddings with more current data on various topics?

▲

endtime 19 hours ago | parent | next [-]

If they could "see" the future and exploit that they'd probably have much higher returns.

	▲	plufz 10 hours ago \| parent \| next [-]
		I would say that if these models independently could create such high returns all these companies would shut down the external access to the models and just have their own money making machine. :)
	▲	alchemist1e9 18 hours ago \| parent \| prev [-]
		56% over 8 months with the constraints provided are pretty good results for Grok.

▲

disconcision 19 hours ago | parent | prev [-]

you can (via the api, or to a lesser degree through the setting in the web client) determine what tools if any a model can use

▲

plufz 10 hours ago | parent | next [-]

But isn’t that more which MCP:s you can configure it to use? Do we have any idea which secret sauce stuff they have? Surely it’s not just a raw model that they are executing?

▲

disconcision 19 hours ago | parent | prev [-]

with the exception that it doesn't seem possible to fully disable this for grok 4

	▲	alchemist1e9 18 hours ago \| parent [-]
		which is curiously the best model …