7

For $x \in \mathbb R^n$ , is $\frac{\partial[x^Tx]}{\partial[x]}$ equal to $2x$ or $2x^T$? There are a lot of discrepancies about this, such as Vector derivation of $x^Tx$. Another example of the discrepancy is the following, Computing matrix-vector calculus derivatives, where Jonas claimed that Par's answer's 4th line is incorrect. As a result, Par's answer was 'a' and Jonas' answer was 'a' transpose. Which answerer to the question of "what's the derivative of x^T a with respect to x" was correct?

ajfbiw.s
  • 375

2 Answers2

4

This is a perfect example of the sort of mess you can get into by working in coordinates. Let's work abstractly: $V$ is a finite-dimensional real inner product space, with inner product $\langle -, - \rangle$. We'd like to know the derivative of the quadratic form $q(x) = \langle x, x \rangle : V \to \mathbb{R}$ at a point $a \in V$. The most important thing to know about this derivative is its type: what sort of mathematical object is it? The answer is that it is a linear functional $V \to \mathbb{R}$; that is, it's a covector. (We can then use the inner product to identify vectors and covectors if we want to, but this just confuses the issue.)

Okay, now let's actually compute it. We have

$$q(a + dx) = \langle a + dx, a + dx \rangle = \langle a, a \rangle + 2 \langle a, dx \rangle + O(dx^2)$$

from which it follows that the derivative at $a$ is the linear functional $dx \mapsto 2 \langle a, dx \rangle$. Using the inner product to identify vectors and covectors, the corresponding vector is $2a$.

Qiaochu Yuan
  • 419,620
  • 1
    Can you explain what a covector is, how it differs from a vector, and how you figured out it's a covector for people who've never heard of such a thing? They're not taught in basic linear algebra courses and it's entirely possible to do a lot of linear algebra without ever hearing about them. It's not even clear what you mean by "working in coordinates" and the reader doesn't even necessarily know what other kinds of options there possibly could exist... – user541686 Feb 21 '16 at 20:50
1

I don't like @Qiaochu's answer since it isn't exactly enlightening as to what the problem is.
I think the problem here stems from not defining what precisely $\partial \vec{f}/\partial \vec{x}$ means.

If you you assume $\vec{x}$ and $\vec{f}$ are column vectors and you use this Jacobian matrix definition $$\frac{\partial \vec{f}}{\partial \vec{x}} = \begin{bmatrix}\vdots \\ \partial f_i/\partial \vec{x} \\ \vdots\end{bmatrix} = \begin{bmatrix}\vdots \\ \vec{\nabla} f_i \\ \vdots\end{bmatrix}$$ then you see $\partial f_i/\partial \vec{x} = \vec{\nabla} f_i$ must be a row vector, not a column vector. So you get $$\frac{\partial}{\partial \vec{x}} \vec{x}^\top \vec{x} = 2 \vec{x}^\top$$ whereas if you define the Jacobian to be the transpose then you get back a column vector, $2\vec{x}$.

user541686
  • 13,772
  • I do like the other answer, but I think this is a useful complement. – quid Feb 21 '16 at 21:10
  • So, the reason for needing a transpose is to convert the derivative to a column vector. But if we are already writing the derivative as a column vector, there is no need to do so. – ajfbiw.s Feb 21 '16 at 21:15
  • @ajfbiw.s it is the other way round. You want the derivative to be a rowvector so that you can multiply it with your column vectors. – quid Feb 21 '16 at 21:18
  • @ajfbiw.s. Well one nice thing about letting $\vec{\nabla} f_i$ be a row vector is that you can e.g. linearize the function as $\vec{\nabla} f_i,(\vec{x}) \approx f_i,(0) + \vec{\nabla} f_i,(\vec{x})\ \vec{x}$ without having to transpose the gradient. It's also nice to have every row in the Jacobian of a function correspond to the derivative of that scalar component of the vector-valued function (visually it's easier to see what is happening). But otherwise it's just a matter of how you want it to be defined; as long as you're consistent it's not a problem. – user541686 Feb 21 '16 at 21:21