Remix.run Logo
spindump8930 3 days ago

This topic is interesting, but the repo and paper have a lot of inconsistencies that make me think this work is hiding behind lots of dense notation and language. For one, the repo states:

> This implementation follows the framework from the paper “Compression Failure in LLMs: Bayesian in Expectation, Not in Realization” (NeurIPS 2024 preprint) and related EDFL/ISR/B2T methodology.

There doesn't seem to be a paper by that title, preprint or actual neurips publication. There is https://arxiv.org/abs/2507.11768, with a different title, and contains lots of inconsistencies with regards to the model. For example, from the appendix:

> All experiments used the OpenAI API with the following configuration:

> • Model: *text-davinci-002*

> • Temperature: 0 (deterministic)

> • Max tokens: 0 (only compute next-token probabilities)

> • Logprobs: 1 (return top token log probability)

> • Rate limiting: 10 concurrent requests maximum

> • Retry logic: Exponential backoff with maximum 3 retries

That model is not remotely appropriate for these experiments and was deprecated in 2023.

I'd suggest anyone excited by this attempt to run the codebase on github and take a close look at the paper.

MontyCarloHall 3 days ago | parent | next [-]

It's telling that neither the repo nor the linked paper have a single empirical demonstration of the ability to predict hallucination. Let's see a few prompts and responses! Instead, all I see is a lot of handwavy philosophical pseudo-math, like using Kolmogorov complexity and Solomonoff induction, two poster children of abstract concepts that are inherently not computable, as explicit algorithmic objectives.

niklassheth 3 days ago | parent | prev [-]

It seems like the repo is mostly if not entirely LLM generated; not a great sign.