$\DeclareMathOperator{\dist}{dist}
\DeclareMathOperator{\conv}{conv}
\DeclareMathOperator{\dim}{dim}
\DeclareMathOperator{\aff}{aff}$
There is a unique point $y_j \in \conv(X_j)$ such that
$\|z-y_j\| = \dist(z, \conv(X_j))$ because $\conv(X_j)$ is a compact and convex set.
As you mention, compactness means that the function $d(x) = \|x-z\|$ achieves its minimum on $\conv(X_j)$. If two distinct points $y_1$ and $y_2$ are at the same distance from $z$, then the midpoint $\frac{y_1 + y_2}{2}$ is closer to $z$ (consider the isosceles triangle formed by $z$, $y_1$, and $y_2$.) But the line segment between any two points of a convex set is included in the set—so the minimum of $d$ must be achieved at a unique point.
Since each point $y_j$ is in $\conv(X_j)$, for any point $x$, $\|x - y_j\| \geq \dist(x, \conv(X_j))$ (because $\dist(x, \conv(X_j))$ is the minimum distance of $x$ to any point of $\conv(X_j)$). So if
$$
f(x) = \sum_{j=1}^r \dist^2(x,\conv(X_j))
$$
and
$$
g(x) = \sum_{j=1}^r \|x - y_j\|^2
$$
then $g(x) \geq f(x)$ for all $x$.
On the other hand, at $z$, $\|z - y_j\| = \dist(z, \conv(X_j))$ for every $j$ (by definition of the points $y_j$), so $g(z) = f(z)$ is the minimum value of both functions.
- An important condition which was somewhat glossed over is that each $Y_j \subseteq X_j$ is chosen to be a minimal set such that $y_j$ is in the relative interior of $\conv(Y_j)$.
Consider each flat $\aff(Y_j)$. The vector from $z$ to the closest point within each flat—call it $\omega_j$—is orthogonal to that flat; $(\omega_j - z) \perp \aff(Y_j)$. If $\omega_j$ is within $\conv(Y_j) \subset \aff(Y_j)$, then it must be $y_j$. But if it is not within $\conv(Y_j)$, then the point $y_j$ must be in the boundary of $\conv(Y_j)$ closest to $\omega_j$, which contradicts that $y_j$ is in the relative interior of $\conv(Y_j)$. (In fact, in this case, $Y_j$ would have been smaller so that $\conv(Y_j)$ is just the boundary in question, and $\aff(Y_j)$ has smaller dimension.)
Thus, the vector $y_j - z$ is orthogonal to $\aff(Y_j)$ for every $j$, and the scalar product $\langle v - z, y_j - z\rangle$ is positive for any $v \in \aff(Y_j)$.
- I guess the general position assumption includes the idea that a hyperplane spanned by one subset of $X$ is never parallel to another hyperplane spanned by a disjoint subset of $X$, nor to an intersection of such other hyperplanes, because otherwise the claim is false.
If you have points in this kind of general position, and the intersection of some affine subspaces spanned by subsets of those points is empty, then the sum of the codimensions of the affine subspaces must be at least $d + 1$ (see dimension of intersection of hyperplanes).
Since each $Y_j$ is a minimal set with $y_j$ in its relative interior, $\dim(Y_j) = |Y_j| - 1$. So
$$
\begin{align*}
d + 1 &\leq \sum_{j=1}^r (d - \dim(Y_j)) \\
&= \sum_{j=1}^r (d + 1 - |Y_j|) \\
&= r(d+1) - \sum_{j=1}^r |Y_j| \\
\sum_{j=1}^r |Y_j| &\leq (r-1)(d+1)
\end{align*}
$$
The inner product $\langle x - y_j, z - y_j\rangle$ being positive means that the vector from $y_j$ to $x$ goes in the same "direction" as the vector from $y_j$ to $z$ (they lie in the same half-space), so that the line segment from $y_j$ to $x$ gets closer to $z$ than $y_j$ is. Then moving $x$ from whatever part $X_i$ it is currently in, to part $X_j$, will result in a smaller value for $\mu$; because we know there is a point in $\conv(Y_j \cup \{x\})$ (a subset of the convex hull of the new $X_j$) which is closer to $z$ than $y_j$ was, while removing $x$ from $X_j$ won't increase $\dist(z, \conv(X_i))$, since the closest point $y_i$ from $\conv(X_i)$ is in $\conv(Y_i)$ which does not involve $x$.
- As I mentioned, the needed assumption is not what is usually called general position but rather that the intersections of any hyperplanes spanned by points in $X$ are not parallel to each other. One way of interpreting such parallel subspaces is that they meet "at infinity". In Roudneff's paper from which your survey takes the proof, it's posed as follows (I've slightly modified it to remove references to bases of positive cones, which are not relevant to us; for the classical Tverberg theorem every set $L_i$ in Roudneff's paper is the empty set.)
We consider $\mathbb{R}^d$ as the affine space $\mathbb{P}^d \setminus H_\infty$, where $\mathbb{P}^d$ denotes the projective space of dimension $d$ and $H_\infty$ the hyperplane at infinity. We first observe that the convex dependences of our configuration of points $X$ are not modified by slightly moving $H_\infty$, i.e., the fact that a given point belongs to the convex hull of a subset $X_i$ of $X$ is preserved. In other words, the oriented matroid of affine dependences defined by $X$ remains the same (see Oriented Matroids for this notion). Thus, we may assume that $H_\infty$ contains no vertex of the projective arrangement of hyperplanes defined by $X$.
I take this to mean that applying a projective transformation can put the points in "general position" in the needed sense.