While it is deceptively simple, after all you place your DUT (Device under test) in a chamber and crank up the heat, there is a lot more going on to these tests than most people suspect.
Ideally you want to come up with a result that allows you to have a certain comfort level that the product will last past it's design lifetime. And the best tool we have is the Arrhenius equation, FIT's (Failures In Time) and other techniques. And these techniques do work and have been verified (by running very long tests). However, as the design complexity increases and you have components made from very different materials and processes and then this technique becomes harder to apply. It is a matter of complexity and not veracity.
The key to a proper long term estimate is to determine the activation energies of the various failure modes within the DUT. Once these are understood then the application of the time and temperature etc. is straight forward. However, determination of activation energies is never straight forward and can involve a whole series of tests at different forcing temperatures and then subsequent inspection and failure analysis.
The fundamental limits are, as always, how much time do you have? and how much money do you have?
All of the above is assuming that the test for long term failure. There are also short term stress tests and burn-in tests that are to ensure that the infant mortality failures do not enter into your sales channel. These tend to be very different failures that long term failures.