Interpretation of Convergence plot in terms of mean square error

Question

I am trying to learn various learning algorithms used in training of neural network. When plotting the Mean Square Error in dB with respect to the number of iterations, if the points are around 0 dB then can we say that the curve converges or does the points need to go to minus dB? If the curve goes towards minus dB with the increase in iterations, then what can be inferred? How good or bad is the technique?

Dilip Sarwate · Accepted Answer · 2015-11-23T20:01:01.990

0 dB corresponds to a MSE of 1. So, if your method gives MSE that approaches 0 dB as the number of iterations increases, then your method has an error floor; it cannot drive the MSE to any value smaller than 1 no matter how many iterations it takes (or how many data points you have) etc. A good method should drive the MSE to 0 (corresponding to $-\infty$ dB), that is, the MSE should decrease continually as the number of iterations increases, and not saturate at a value of 1 (0 dB). We can then argue as to whether the approach to 0 MSE ($-\infty$ dB) is exponential (MSE $= O(e^{-n})$) or polynomial (MSE $= O(n^{-k})$) including as special cases quadratic (MSE $=O(n^{-2})$) and linear (MSE $= O(n^{-1})$), or sublinear (e.g., MSE $= O(n^{-1/2})$ or MSE $= O((\ln n)^{-1})$ or similar) where $n$ is the number of iterations (or number of data points). Naturally, the faster the decrease, the better (and usually more expensive) the method.

On the other hand, if the MSE of the best currently known method is approaching an error floor of 0 dB, there is a fantastic research opportunity (and possibly a get-rich-quick opportunity too): devise a method that drives the MSE to 0 instead of 1 and gain instant fame (and if you patent the method and found a start-up company to further develop it, possibly fortune RSN as well).

Interpretation of Convergence plot in terms of mean square error

1 Answers1