Their major difference in terms of learning is mentioned by your own reference of Adaline:
The difference between Adaline and the standard perceptron is in how they learn. Adaline unit weights are adjusted to match a teacher signal, before applying the Heaviside function, but the standard perceptron unit weights are adjusted to match the correct output, after applying the Heaviside function.
Your standard Rosenblatt's perceptron's learning rule has no MSE cost function, if the prediction is incorrect for a training sample, it adjusts each weight to reduce the error. The weight adjustments are proportional to the difference between the predicted and true outputs and are based on the input values which was partly inspired by Hebbian learning. It's a simple and linear learning algorithm and works well with a small learning rate for (binary) classification tasks where the data is only linearly separable.
Adaline is a comparatively more advanced learning rule and converges asymptotically toward the minimum error hypothesis, possibly requiring unbounded
time, but converges regardless of whether the training data are linearly separable. However, its resultant MSE minimizing weights from the teacher signals will not necessarily minimize the number of training samples misclassified by the final thresholded predicted outputs.
Am I correct?
– will The J Nov 05 '23 at 14:57