one I forgot, please visit the benchmark of Isartor and see the deflection rate to reduce LLM tokens: https://github.com/isartor-ai/Isartor/tree/main/benchmarks