3

1) Are there any extensions to Newton's method for finding minimum of a convex function when the Hessian is singular ? (I have all positive eigenvalues in the Hessian except one which is zero)

I found a note here which discusses this problem (page 9). It suggests using something like $H^TH+\lambda I$ for the Hessian. But doesn't provide details on how to find $\lambda$. Are there any methods other than adding a multiple of identity ? It failed for $\lambda$'s that I tried.

2) I tried using BFGS method in scipy package, which wasn't successful. Can BFGS be used as a solution when Hessian is not invertible ?

My question is related to this

  • 1
    is the null space of the hessian fixed? Or can't you reparameterize your objective function and remove $1$ dimension? – user251257 Aug 28 '15 at 02:04
  • I believe that BFGS can be used in the case that the true Hessian is non-invertible because BFGS will compute an approximate Hessian for the descent. Another method I usually use, assuming your function has a bit of noise (Machine learning, etc). is to perturb your Hessian a bit to make it invertible (ie add noise). –  Aug 28 '15 at 02:14
  • if the hessian is not positive definite, you can't guarantee that the bfgs update exists – user251257 Aug 28 '15 at 02:32
  • The article you linked to actually provides quite a bit of detail; section 4.1 for instance. – Michael Grant Aug 28 '15 at 02:41
  • @user251257 What do you mean by null space is fixed ? Is it like null space has only one solution ? Sorry about my poor linear algebra knowledge. I need to think about reparameterization. In fact two rows in my hessian add up to give another row. That is what generates the singularity. – user2939212 Aug 28 '15 at 02:44
  • @aidan.plenert.macdonald My application is indeed in machine learning. I am maximizing the log likelihood via EM. But I wonder whether adding noise will make any difference as it is evident that rows are dependent in the Hessian no matter what are the observed variables. – user2939212 Aug 28 '15 at 02:45
  • It is a big red flag for me that adding $\lambda I$ to the Hessian does not work. For any $\lambda>0$, this regularization guarantees that your function is strictly convex, and for sufficiently large $\lambda$, that it is self-concordant. If Newton's method isn't working, there is something else off about your problem formulation (insufficiently regular objective function, non-convex domain, no minimum exists, etc) or a bug in your implementation. – user7530 Aug 28 '15 at 02:49
  • @user2939212 I meant fixed like independent of the unknown. That would be the easiest situation. Anyway, post your objective function and hessian. It would easier for us to understand the problem :) – user251257 Aug 28 '15 at 02:50
  • @user2939212 My argument for adding noise to the Hessian would depend on the data set. If your features are noisy, then taking the Hessian would include some noise, so adding a small amount of noise shouldn't hurt (NOT mathematically rigorous, but neither are most of the features you work with). I don't know what EM stands for, so I can't speak for that. That said, BFGS is know to work 'well" for machine learning optimizations, but I would look at Stochastic Batch Gradient Descent and Conjugate Gradient Descent. Computing a Hessian approximation can be expensive. –  Aug 28 '15 at 13:43
  • @user7530 Thank you for the information. Finally I found the error. As I was using probabilities, numerical underflow in my program was the cause of my error. Adding a multiple of identity matrix worked as expected. EM stands for expectation maximization. Thank you for all your time. – user2939212 Aug 28 '15 at 15:09
  • What do you mean by " It failed for λ's that I tried." ? Adding adding $\frac{\lambda}{2}|.|_2^2$ to your objective (assumed convex) is guaranteed to make it strongly convex, and make the hessian invertible (which becomes old hessian + $\lambda I \succ 0$). Also it'd be nice to write down the problem you're trying to solve. – dohmatob Aug 29 '15 at 08:43

0 Answers0