Ah, I see what you mean. Yeah, there was too much output from too many models at once (combined with not enough spare time) to really perform useful qualitative analysis on all the models' performance.