Remix.run Logo
hashta 8 hours ago

To people outside the field, the title/abstract can make it sound like folding is just inherently simple now, but this model wouldn’t exist without the large synthetic dataset produced by the more complex AF. The "simple" architecture is still using the complex model indirectly through distillation. We didn’t really extract new tricks to design a simpler model from scratch, we shifted the complexity from the model space into the data space (think GPT-5 => GPT-5-mini, there’s no GPT-5-mini without GPT-5)

stavros 7 hours ago | parent | next [-]

But this is just a detail, right? If we went and painstakingly catalogued millions of proteins, we'd be able to use the simple model without needing a complex model to generated data, no?

godelski 5 hours ago | parent | prev [-]

  > To people outside the field
So what?

It's a research paper. That's not how you communicate to a general audience. Just because the paper is accessible in terms of literal access doesn't mean you're the intended audience. Papers are how scientists communicate to other scientists. More specifically, it is how communication happens between peers. They shouldn't even be writing for just other scientists. They shouldn't be writing for even the full set of machine learning researchers nor the full set of biologists. Their intended audience is people researching computational systems that solve protein folding problems.

I'm sorry, but where do you want scientists to be able to talk directly to their peers? Behind closed doors? I just honestly don't understand these types of arguments.

Besides, anyone conflating "Simpler than You Think" as "Simple" is far from qualified from being able to read such a paper. They'll misread whatever the authors say. Conflating those two is something we'd expect from an Elementary School level reader who is unable to process comparative statements.

I don't think we should be making that the bar...

hashta 5 hours ago | parent [-]

It’s literally called "SimpleFold". But that’s not really my point, from your earlier comment (".. go through all the complexities first to find the generalized and simpler formulations"), I got the impression you thought the simplicity came purely from architectural insights. My point was just that to compare apples to apples, a model claiming "simpler but just as good" should ideally train on the same kind of data as AF or at least acknowledge very clearly that substantial amount of its training data comes from AF.

I’m not trying to knock the work, I think it’s genuinely cool and a great engineering result. I just wanted to flag that nuance for readers who might not have the time or background to spot it, and I get that part of the "simple/simpler" messaging is also about attracting attention which clearly worked!

godelski 4 hours ago | parent [-]

  > I got the impression you thought the simplicity came purely from architectural insights.
I'm unsure where I indicated that, but apologize for the confusion. I was initially pushing back against your original criticism of something like Alphafold having needed to be built first.

Like you suggest, simple can mean many things. I think it's clear that in this context they mean "simple" (not from an absolute sense) in terms of the architectural design. I think the abstract is more than sufficient to convey this.

  > My point was just that to compare apples to apples
As a ML researcher who does a lot of work on architecture and efficiency, I think they are. Consider this from the end of the abstract

  | SimpleFold shows efficiency in deployment and inference on consumer-level hardware. 
To me they are clearly stating that their goal isn't to get the top score on a benchmark. Their appendix shows that the 100M param is apples to apples to alphafold2 by size but not by compute. Even their 3B model uses less compute then alphafold2.

So being someone in a neighboring niche, I don't understand your claim. There's no easy way to make your comparisons "apples to apples" because we shouldn't be evaluating on a single metric. Sure, alphafold2 gives better results on the benchmarks but does that mean people wouldn't sacrifice performance for a 450x reduction in compute? (20x for their largest model. But note that compute, not memory).

  >  messaging is also about attracting attention
Yeah this is an unfortunate thing and I'm incredibly frustrated with this in academia and especially in ML. But it's also why I'm pushing against you. The problem stems from needing to get people to read your paper. There's a perverse incentive because you could have a paper that is groundbreaking but ends up having little to no impact because it didn't get read. A common occurrence is that less innovative papers will get magnitudes more citations by using similar methods but scale and beat benchmarks. So unfortunately as long as we use citation metrics as a significant measure of our research impact then marketing will be necessary. A catchy title is a good way to get more eyeballs. But I think you're being too nitpicky here and there's far more egregious/problematic examples. I'm not going to pick my fight with a title when the abstract is sufficiently clear. Could it be more clear? Certainly. But if the title is all that's wrong then it's a pretty petty problem. Especially if it's only confusing people who are significantly outside the target audience.

Seriously, what's the alternative? That researchers write to the general public? To the general technical public? I'm sorry, I don't think that's a good solution. It's already difficult to communicate to people in the same domain (but not niche) in the page limit. It's hard to be them to read everything as it is. I'd rather papers be written strongly for the niche peers and enough generalization that domain experts can get through it with effort. For the general public, that's what science communicators are for