0

When we calculate the gradient wrt to each paramters, we consider the other parameters remain constant, but the moment their is a change in any of the other parameters, shouldn't all the other changes become invalid?

Edit: In the usual gradient descent we assume that all the weights and biases for all the layers to be the input parameters for the loss function which we try to minimize. In that, we inherently assume that the inputs are independent on each other, which in case of a multilayer NN not true, the params of a layer should be influenced by its previous layers, which is why the param updates should atleast be done on a layer by layer bases, i.e. although inefficient, the correct method should be, if we make a nudge in the params of a layer what would be the best param value for the next layer.

Am I missing, or misinterpreting something, how is gradient descent able to take into account the interdependence of the parameters of a layer on its own since we cannot manually define how the parameters across layers are dependent on each other?

0 Answers0