How can we go about learning the parameters of a feedforward neural network? Remember that the gradient descent algorithm for a simple neural network was as follows:
Now, instead of and , we have and . We can put both of these into one vector called , modifying the algorithm to:
Where:
is composed of the gradients of the weight and bias of each layer in the network. So, how can we calculate the loss function and how can we calculate the gradient?