I have seen teams spend months fine-tuning retrieval algorithms when the real issue was that their ingestion pipeline was feeding HTML boilerplate into the vector store. Fix the input first.