Can you share more about the challenges ran into on the benchmarking? According to the benchmark note, Claude 4.5 Opus and Gemini 3 Pro Preview exhibited elevated rejection and were dropped from TruthfulQA without further discussion. To me this begs the questions, does this indicated that frontier closed SOTA model will likely not allow this approach in the future (ie in the process of screening for potential attack vectors) and/or that this approach will only be limited to a certain LLM architecture? If it’s an architecture limitation, it’s worth discussing chaining for easier policy enforcement.

▲

cgorlla 3 days ago | parent [-]

I checked with the team and it may have been some temporary rate-limiting issue. We've rectified the results, it seems to be an isolated case.

https://www.ctgt.ai/benchmarks

	▲	rancar2 3 days ago \| parent \| next [-]
		Thanks for the thoroughness! I look forward to the next steps as you all apply this approach in other unique ways to have even better results.
	▲	SomaticPirate 3 days ago \| parent \| prev [-]
		Are these benchmarks correct that adding Anthropic's Constitutional AI system prompt lowered results across all the models?