Well, all LLMs have nonlinear activation functions (because all useful neural nets require nonlinear activation functions) so I think you might be onto something.