so how much of a factor is it that safety guardrails may be keeping the current models from achieving higher scores in whatever red teaming benchmarks exist?