> I would like to reach out and talk to biologists - do you find these models to be useful and capable? Can it save you time the way a highly capable colleague would?

Well, I would say they have done precisely that in evaluating the model, no? For example section 2.2.5.1:

>Uplift and feasibility results

>The median expert assessed the model as a force-multiplier that saves meaningful time (uplift level 2 of 4), with only two biology experts rating it comparable to consulting a knowledgeable specialist (level 3). No expert assigned the highest rating. Most experts were able to iterate with the model toward a plan they judged as having only narrow gaps, but feasibility scores reflected that substantial outside expertise remained necessary to close them.

Other similar examples also in the system card

▲

torginus 2 hours ago | parent [-]

This is the exact logic people that was used to claim that GPT4 was a PhD level intelligence.

▲

redfloatplane 2 hours ago | parent [-]

You said: "I would like to reach out and talk to biologists - do you find these models to be useful and capable? Can it save you time the way a highly capable colleague would?" and they said, paraphrasing, "We reached out and talked to biologists and asked them to rank the model between 0 and 4 where 4 is a world expert, and the median people said it was a 2, which was that it helped them save time in the way a capable colleague would" specifically "Specific, actionable info; saves expert meaningful time; fills gaps in adjacent domains"

so I'm just telling you they did the thing you said you wanted.

▲

torginus 2 hours ago | parent [-]

Yes that is correct. I would like a large body of experience and consenus to rely on as opposed to the regular 'trust the experts' argument, which has been shown for decades that is a deeply flawed and easy to manipulate argument.

▲

bonsai_spool an hour ago | parent [-]

> Yes that is correct. I would like a large body of experience and consenus to rely on as opposed to the regular 'trust the experts' argument, which has been shown for decades that is a deeply flawed and easy to manipulate argument.

Yes, it is far inferior to the 'Trust torginus and his ability to understand the large body of experience that other actual subject-matter-experts have somehow not understood' strategy

	▲	torginus 19 minutes ago \| parent [-]
		It's not my credibility I want to measure against Anthropic's. I just said to apply the same logic to biology you would apply for software development. The parallels here are quite remarkable imo, but defer to your own judgement on what you make of them.