I've done it? I have a benchmark called scramblebench that will do rewriting to evaluate model performance degradation with symbol replacement and layers of indirection.