I expected a “torch is smart enough to keep track of cases where it just initialized the C in C <= A*B+C to zero, avoiding the add” type situation but I was wrong.