Remix.run Logo
bigdict 2 days ago

What's the point of the relu in the loss function? Its inputs are nonnegative anyway.

Nevermark 2 days ago | parent | next [-]

Let's try to keep things positive.

GolDDranks 2 days ago | parent | prev | next [-]

I wondered the same. Seems like it would just make a V-shaped loss around the zero, but abs has that property already!

fancyfredbot 2 days ago | parent [-]

RELU would have made it flat below zero ( _/ not \/). Adding the abs first just makes RELU do nothing.

andy_ppp 2 days ago | parent | prev | next [-]

In reality it’s probably not a RELU modern LLMs use GeLU or something more advanced.

meindnoch 2 days ago | parent | prev | next [-]

Sometimes a cosmic ray might hit the sign bit of the register and flip it to a negative value. So it is useful to pass it through a rectifier to ensure it's never negative, even in this rare case.

lblume a day ago | parent [-]

Indeed, we should call all idempotent functions twice just in case the first incantation fails to succeed.

In all seriousness, this is not at all how resilience to cosmic interference works in practice, and the probability of any executed instruction or even any other bit being flipped is far greater than the one specific bit you are addressing.

fancyfredbot 2 days ago | parent | prev [-]

I thought the belt and braces approach was a valuable contribution to AI safety. Better safe than sorry with these troublesome negative numbers!

naniwaduni 2 days ago | parent [-]

Well, I guess it's helping to distinguish authors who are doing arithmetic they understand from ones who are copying received incantations around...