▲ | spindump8930 3 days ago | |
This topic is interesting, but the repo and paper have a lot of inconsistencies that make me think this work is hiding behind lots of dense notation and language. For one, the repo states: > This implementation follows the framework from the paper “Compression Failure in LLMs: Bayesian in Expectation, Not in Realization” (NeurIPS 2024 preprint) and related EDFL/ISR/B2T methodology. There doesn't seem to be a paper by that title, preprint or actual neurips publication. There is https://arxiv.org/abs/2507.11768, with a different title, and contains lots of inconsistencies with regards to the model. For example, from the appendix: > All experiments used the OpenAI API with the following configuration: > • Model: *text-davinci-002* > • Temperature: 0 (deterministic) > • Max tokens: 0 (only compute next-token probabilities) > • Logprobs: 1 (return top token log probability) > • Rate limiting: 10 concurrent requests maximum > • Retry logic: Exponential backoff with maximum 3 retries That model is not remotely appropriate for these experiments and was deprecated in 2023. I'd suggest anyone excited by this attempt to run the codebase on github and take a close look at the paper. | ||
▲ | MontyCarloHall 3 days ago | parent | next [-] | |
It's telling that neither the repo nor the linked paper have a single empirical demonstration of the ability to predict hallucination. Let's see a few prompts and responses! Instead, all I see is a lot of handwavy philosophical pseudo-math, like using Kolmogorov complexity and Solomonoff induction, two poster children of abstract concepts that are inherently not computable, as explicit algorithmic objectives. | ||
▲ | niklassheth 3 days ago | parent | prev [-] | |
It seems like the repo is mostly if not entirely LLM generated; not a great sign. |