I have no illusions on LLMs, I have been working with them since og BERT, always with these same issues and more. I'm just stating what would be needed in the future to make them reliably useful outside of creative writing & (human-guided & checked) search.
If an LLM provides an incorrect/orthogonal rhetoric without a way to reliably fix/debug it it's just not as useful as it theoretically could be given the data contained in the parameters.