I am trying to derive the derivative of the loss function from least squares. If I have this (I am using ' to denote the transpose as in matlab)
(y-Xw)'(y-Xw)
and I expand it
=(y'- w'X')(y-Xw)
=y'y -y'Xw -w'X'y + w'X'Xw
=y'y -y'Xw -y'Xw + w'X'Xw
=y'y -2y'Xw + w'X'Xw
Now I get the gradient
=-2y'Xw + X'(Xw) + X(w'X')
=-2y'Xw + X'(Xw) + X'(Xw)
=-2y'Xw + 2X'(Xw)
And that is the intended result. Now, I saw in this post Vector derivation of $x^Tx$ That the gradient of x'x=2x, So I am trying to get the same result applying that, and the chain rule to get the gradient of
=(y-Xw)'(y-Xw)
So I think this might be
=2(y-Xw)(-X)
=-2yX + 2XwX
The result is similar but the transpositions are missing so it would not work... What am I doing wrong? My mathematical background has almost disappeared and I just started to begin the recovery, so please be patient if I did something terribly wrong...