TLDR; OP used LLM models without search + reasoning and get bad results. He then concludes: Don't believe the hype.