Generalising Dirichlet Distribution to Dirichlet Process

Question

I'm trying to follow a tutorial paper on generalizing Dirichlet Distribution Finite Mixture Models to Dirichlet Process Infinite Mixture Models;

Li, Y., Schofield, E., & Gönen, M. (2019). A tutorial on Dirichlet process mixture modeling. Journal of Mathematical Psychology, 91, 128-144.

Some context to clarify my questions (at the end of this post) follows;

They start by deriving a Gibbs-sampling algorithm for the case of a Dirichlet distribution prior finite Gaussian mixture model (Section 2). In Section 3, they extend this to the infinite mixture case.

I get stuck following the derivation at Section 3.2: When $K$ approaches infinity. They start from

$$ p(c_1, \dots, c_k ~|~ \alpha) = \frac{\Gamma(\alpha)}{\Gamma(n + \alpha)} \prod_{k=1}^K \frac{\Gamma(n_k + \alpha / K)}{\Gamma(\alpha/K)} $$

where $c_i$ is a variable such that $c_i = k$ indicates the $i$th observation $y_i$ belongs to cluster $k$, $\alpha$ is the symmetric Dirichlet distribution concentration parameter, $n$ is the total number of observations, $n_k$ is the number of observations in cluster $k$, $K$ is the total number of clusters, and $\Gamma(\cdot)$ is the gamma function.

The derivations below then show that

$$ p(c_i = k ~|~\mathbf{c}_{-i}, \alpha) = \frac{n_{-i, k}+\alpha/K}{n-1+\alpha} $$

where $-i$ denotes all indicators except the $i$th, and "$n_{-i,k}$ represents the number of observations nested within the $k$th cluster excluding $y_i$".

The derivation (from Appendix A.4) is included below;

My questions that I'm stuck on;

What does "$n_{-i,k}$ represents the number of observations nested within the $k$th cluster excluding $y_i$" mean. Wouldn't this number just be equal to $n_k - 1$? Why the special notation?
I can follow the first 4 lines ($=$ signs) of the derivation, but get stuck at the jump from line 4 to line 5. For example, the equation just above the text "a more intuitive explanation..." - it seems that they have dropped the product operator. How did they acomplish this?

TracyYXChen · Accepted Answer · 2020-01-28T22:02:13.957

1

Not all the time. $n_{-i,k}$ equals to $n_k - 1$ only when the token was assigned to $k$ in the past iteration, otherwise $n_{-i,k}$ is just $n_k$; We use $n_{-i,k}$ instead of $n_k -1$ in the Gibbs sampling because the algorithm will iterate many times, and each time we will update the assignment for each token $i$. If we use $n_k - 1$, it looks like the number of tokens assigned to class $k$ is fixed.
We use product operator only in join distributions, here we have $z_i = k$, so only $k$ matters, i.e. the product operator actually can be removed at the beginning.

edited Jan 28 '20 at 22:02

answered Jan 21 '20 at 02:47

TracyYXChen

128

Thank you for the clarifications! Are you one of the authors of the tutorial paper? – aaronsnoswell Jan 21 '20 at 06:21
No, but this derivation is frequently-used in research papers related to topic models. – TracyYXChen Jan 21 '20 at 15:40

Generalising Dirichlet Distribution to Dirichlet Process

1 Answers1