interesting idea, this benchmark maps fairly closely to the types of output I typically ask LLMs to generate for me day-to-day
ayy great to hear!