But this is just a detail, right? If we went and painstakingly catalogued millions of proteins, we'd be able to use the simple model without needing a complex model to generated data, no?