17 May 2020

Neural network explained with simple example with numpy Python

Neural Network is used in everywhere like speech recognition, face recognition, marketing, healthcare etc. Artificial Neural network mimic the behaviour of human brain and try to solve any given (data driven) problems like human. Neural Network consists of multiple layers of Perceptrons. When you fed some input data to Neural Network, this data is then passes through those multiple layers of Perceptrons to produce desired output.

In this tutorial I will explain each step to train a neural network with a simple example and write Neural Network from Scratch using numpy Python. After reading this tutorial you will have answer for below questions:
  • What is Neural Network
  • How Neural Network Works
  • Steps to build a Neural Network
  • How Forward propagation works
  • Error Calculation Neural Network
  • How Back Propagation works
  • Matrix calculation of neural network in Python

Before moving into each steps of neural network let me give you an overview of Neural Network Architecture.

Architecture of Neural Network

A neural network consists of three layers:
1.     Input Layer: In this layer input data needs to feed. Input of input layer goes to hidden layer

2.     Hidden Layer: Locate between input and output layer. Input of hidden layer is output of input layer. In real world example there can be multiple hidden layers. To explain neural network, I am using one hidden layer in this article

3.     Output LayerOutput of hidden layer goes to output layerThis layer generate predicted output of Neural Network. In above picture and for this article I am considering two class Neural Network (Out y1, Out y2)


Neural Network Formation

Before listing down all equations of a simple neural network, let me clear you that, an artificial neural network equation consist of three things:
1.     Linear function
2.     Bias
3.     Activation function
Output of any layer is the combination of bias and activation function with a linear function.
For example
Input of H1 (or Output of x1) = x1w1 + x2w2 + b1
Here
x1w1 + x2w2 is the linear function
b1 is the bias (constant)

Activation function is required to calculate output of any layer.
Now let’s calculate output of H1
To calculate output of H1 you need to apply activation function to input of H1. You can use any activation function like: Sigmoid, Tanh, ReLu etc. For this tutorial I am using sigmoid function as my activation function.
Let me show you the equation for sigmoid function.
\[\mathbf{ Sigmoid\; function = \frac{1}{1+e^{-x}}}\]
So after applying activation function with input of H1, we will get Output of H1
\[\mathbf{Output\;of\;H_{1} = \frac{1}{1+e^{-H_1}}}\]

Steps to train Neural Network

There are three steps to train a Neural Network
1.     Forward Propagation
2.     Error Calculation
3.     Back Propagation
Now let’s explore each steps of neural network in detail.
In this tutorial I am denoting
·        h1 (after applying linear function and bias) as input of H1
·        h2 (after applying linear function and bias) as input of H1
·        Out h1 (after applying activation function) as output of H1
·        Out h2 (after applying activation function) as output of H2
·        y1 (after applying linear function and bias) as input of y1 layer
·        y2 (after applying linear function and bias) as input of y2 layer
·        Out y1 (after applying activation function) as output of y1 layer
·        Out y2 (after applying activation function) as output of y2 layer
·        ETotal as total error of the Neural Network model


Forward Propagation in Neural Network

Let’s assume we want to apply Neural Network in below dataset, where output (T1 | T2) have two class of probability (for example probability to win and probability to loss)

To explain how neural network works, let’s assume we have only one row of below dataset.

X1 X2 T (T1/T2)
0.030.09(0.01/ 0.99)
0.040.1(0.99/ 0.01)
0.050.11(0.01/ 0.99)
0.060.12(0.99/ 0.01)
In forward propagation of Neural Network left to right directional calculation happens.
1.     First: Input data (x1, x2) fed into input layer
2.     Second: Hidden Layer (H1, H2) calculation
3.     Third: Predict output by output layer (y1, y2) calculation


To calculate forward propagation in hand, let’s take some numbers for weights, bias and target value or actual output (first row output T1 | T2) along with input value (first row of our dataset).

Input ValueBiasWeights 1st layerWeights 2nd layerActual/ Target output
x1 = 0.03b1 = 0.39w1 = 0.11w5 = 0.44T1 = 0.01
x2 = 0.09b2 = 0.42w2 = 0.27w6 = 0.48T2 = 0.99
w3 = 0.19w7 = 0.23
w4 = 0.52w8 = 0.29

Now let’s start calculation for each step of forward propagation

1. Hidden layer Calculation

In this tutorial I am using two hidden units (h1, h2) in hidden layer. Let’s calculate output of those hidden units.

h1 = x1w1 + x2w2 +b1 = 0.03*0.11 + 0.09*0.27 + 0.39 = 0.4176

Now,

\[\mathbf{Out\;h_1=\frac{1}{1+e^{-h_1}}}\] So, \[ Out\;h_1=\frac{1}{1+e^{-0.4176}}=0.60290881\]
In the similar way

h2 = x1w3 + x2w4 +b1= 0.03*0.19 + 0.09*0.52 + 0.39 = 0.4425

So,

\[Out\;h_2=\frac{1}{1+e^{-0.4425}}=0.60885457\]


Now that we have done with hidden layer calculation, we will move on to output layer.

2. Output layer Calculation

In the similar way of hidden layer:

y1 = Outh* w5 + Outh* w6 + b
= 0.60290881 * 0.44 + 0.60885456921 * 0.48 + 0.42 = 0.97753007

So now,

\[Out\;y_1=\frac{1}{1+e^{-0.97753007}}=0.72661785\]
In the similar way we can calculate:
y2 = Outh* w7 + Outh* w8 + b
= 0.60290881 * 0.23 + 0.60885457 * 0.29 + 0.42 = 0.73523685  
And,

\[Out\;y_2=\frac{1}{1+e^{-0.73523685}}=0.67595341\]
Now that we got Out y1 and Out y2 (predicted target value), we will calculate the error now to find out how accurately our Neural Network algorithm is predicting.

Error Calculation in Neural Network

To find out accuracy of any algorithm error calculation is essential. There are many techniques to calculate error. In this tutorial I am using (or I will calculate) Mean Square Error to find out accuracy.
For example the target value y1 is 0.01 but neural network predicted output (value) for y1 (out y1) is 0.72661785, therefore it is an error.
So calculating mean square error for y1

\[\mathbf{E_1} = \frac{1}{2} (T_1-Out y_1 )^2= \frac{1}{2} (0.01-0.72661785)^2= 0.25677057\]
Similarly calculating mean square error for y2

\[\mathbf{E_2} = \frac{1}{2} (T_2-Out y_2 )^2= \frac{1}{2} (0.99-0.67595341)^2= 0.04931263\]
So the total error for our neural network (after one iteration) is the sum of these errors:
ETotal = E1 + E2 = 0.25677057 + 0.04931263 = 0.3060832

Back Propagation in Neural Network

Now that we know how much error our algorithm has (after one iteration). Now we need to improve accuracy (decrease error) of our algorithm. One of the standard ways to improve accuracy is updating weight values. The way to update weight values in Neural Network is called Back Propagation.
In Back Propagation our goal is to update each of the weights (w1, w2, ...w8) in the network so that they cause the actual output to be closer the target output. In this way we can minimize the error for each output neuron (y1 and y2)
Now let’s see how back propagation works in Neural Network.


1.Back Propagation for Output Layer

We will apply partial differentiation to get how much changes are required to update w5.
Consider weight w5, we want to know how much changes in weight w5 affect the total error of our neural network ().

Note:  is the partial derivative (or gradient) of ETotal with respect to w5
By applying chain rule we can get.

\[\mathbf{\frac{dE_{Total}}{dw_5}}=\frac{dE_{Total}}{dOut y_1}*\frac{dOut y_1}{dy_1}*\frac{dy_1}{dw_5}\;\;[Applying\;chain\;rule]\]
Now,
ETotal = E1 + E2

\[\mathbf{E_{Total} = \frac{1}{2} (T_1-Out y_1 )^2+\frac{1}{2} (T_2-Out y_2 )^2}\]
So,

\[\mathbf{\frac{dE_{Total}}{dOut y_1}}=2*\frac{1}{2}(T_1-Out y_1)^{2-1}*(0-1)+0 \;\; [As\;\frac{d}{dx}x^n=nx^{n-1}]\] \[=-(T_1-Out y_1 )=-(0.01-0.72661785)=0.71661785\]
So,

\[\mathbf{\frac{dE_{Total}}{dOut y_1}= 0.71661785}\]
Now,

\[Out\;y_1=\frac{1}{1+e^{-y_1}}\]
So,

\[\mathbf{\frac{d\;Out y_1}{dy_1}}=\frac{d}{dy_1}(1+e^{-y_1})^{-1}\] \[=-1*(1+e^{-y_1})^{-1-1}*[0+(-e^{-y_1})]\;\;[As \mathbf{\frac{d}{dx}e^{-x}=-e^{-x}}\;,\;derivative\;of\;sigmoid\;explained\;at\;the\;end\;of\;this\;tutorial]\] \[=-(1+e^{-y_1})^{-2}*(-e^{-y_1})\] \[=\frac{e^{-y_1}}{(1+e^{-y_1})^2}\] \[=\frac{1}{(1+e^{-y_1})}*\frac{e^{-y_1}}{(1+e^{-y_1})}\] \[=\frac{1}{(1+e^{-y_1})}*\frac{(1+e^{-y_1})-1}{(1+e^{-y_1})}\] \[=\frac{1}{(1+e^{-y_1})}*[1-\frac{1}{(1+e^{-y_1})}]\]
= Out y1 (1-Out y1)
= 0.72661785 * (1 - 0.72661785)
= 0.19864435
So,

\[\mathbf{\frac{d\;Out\;y_1}{dy_1}=0.19864435}\]
Again,
y1 = Out h1 *w5 +Out h2 * w6 +b2
So,

\[\mathbf{\frac{dy_1}{dw_5}=Out\;h_1=0.60290881}\]
So finally,

\[\mathbf{\frac{dE_{Total}}{dw_5}}=\frac{dE_{Total}}{dOut\;y_1}*\frac{dOut\;y_1}{dy_1}*\frac{dy_1}{dw_5}\] \[\mathbf{\frac{dE_{Total}}{dw_5}=0.71661785*0.19864435* 0.60290881=0.08582533}\]

Update w5 with change value

After calculating how much changes required in weight (w1) to affect total error of our neural network model, to decrease error, we will subtract this change value () from the current weight (w5).

\[\mathbf{(new)w_5=w_5-\eta *\frac{dE_{Total}}{dw_5}}\;\;[\eta\;is\;learning\;rate]\]
=0.44 – 0.3 * 0.08582533  [Taking η = 0.3]
(new)w5 = 0.41425240
In this similar way we can calculate all updated weights or new weights (w6, w7, w8) of output layer.


2.Back Propagation for Hidden Layer

After calculating new values for output layer weights, we will continue backward pass to calculate new values for hidden layer weights (w1, w2, w3, w4)
Consider weight w1, we want to know how much changes in weight w1 affect the total error of our neural network ()

Note: Output of each hidden layer neuron (Out h1, Out h2) contributes to the output of each output neurons (Out y1, Out y2) and therefore contributes to the error.

\[\mathbf{\frac{dE_Total}{dw_1}=\frac{dE_Total}{dOut\;h_1}*\frac{dOut\;h_1}{dh_1}*\frac{dh_1}{dw_1}\;\;[Applying\;chain\;rule]}\]
Now,

\[\mathbf{\frac{dE_{Total}}{dOut\;h_1}=\frac{dE_1}{dOut\;h_1}+\frac{dE_2}{dOut\;h_1}}\]
Now again,

\[\mathbf{\frac{dE_1}{dOut\;h_1}=\frac{dE_1}{dy_1}*\frac{dy_1}{dOut\;h_1}}\]
Now,

\[\mathbf{\frac{dE_1}{dy_1}}=\frac{dE_1}{dOut\;y_1}*\frac{dOut\;y_1}{dy_1}\] \[\frac{dE_1}{dy_1}=[2*\frac{1}{2}(T_1-Out\;y_1 )^{2-1}*(0-1)]*\frac{dOut\;y_1}{dy_1}\;\;\mathbf{[As\;E1 = \frac{1}{2}(T_1-Out\;y_1 )^2]}\] \[\frac{dE_1}{dy_1}=[0.71661785]*\frac{dOut\;y_1}{dy_1}\] \[\mathbf{\frac{dE_1}{dy_1}}=0.71661785*\mathbf{0.19864435}=\mathbf{0.14235209}\]
And,

\[\mathbf{\frac{dy_1}{dOut\;h_1}}=\frac{d}{dOut\;h_1}(Out\;h_1*w_5+Out h_2*w_6+b_2)\\\\\\ =w5 +0 +0=0.44\]
So finally,

\[\mathbf{\frac{dE_1}{dOut\;h_1}=\frac{dE_1}{dy_1}*\frac{dy_1}{dOut\;h_1}=0.14235209*0.44=0.06263492}\]
Similarly we can calculate,

\[\mathbf{\frac{dE_2}{dOut\;h_1}=\frac{dE_2}{dy_2}*\frac{dy_2}{dOut\;h_1}}\]
Now,

\[\mathbf{\frac{dE_2}{dy_2}}=\frac{dE_2}{dOut\;y_2}*\frac{dOut\;y_2}{dy_2}\] \[\frac{dE_2}{dy_2}=\frac{d}{dOut\;y_2}[\frac{1}{2}(T_2-Out\;y_2)^2]*\frac{dOut\;y_2}{dy_2}\] \[\frac{dE_2}{dy_2}=[2*\frac{1}{2}(T_2-Out\;y_2 )^{2-1}*(0-1)]*\frac{dOut\;y_2}{dy_2}\] \[\frac{dE_2}{dy_2}=[(0.99-0.67595341)*(0-1)]*\frac{dOut\;y_2}{dy_2}\] \[\frac{dE_2}{dy_2}=-0.31404659*\frac{dOut\;y_2}{dy_2}\] \[\frac{dE_2}{dy_2}=-0.31404659\;*\;[\mathbf{Out y_2 (1-Out\;y_2 )}]\;\;[Similar\;type\;of\;calculation\;have\;done\;before]\] \[\frac{dE_2}{dy_2}=-0.31404659*[0.67595341(1-0.67595341)]\] \[\frac{dE_2}{dy_2}=-0.31404659*0.21904039\] \[\mathbf{\frac{dE_2}{dy_2}=-0.06878889}\]
And,

\[\frac{dy_2}{dOut\;h_1}= \frac{d}{dOut\;h_1}(Out\;h_1*w_7+Out h_2*w_8+b_2)\\\\ = w7 +0 +0= 0.23\]
So finally,

\[\mathbf{\frac{dE_2}{dOut\;h_1}=\frac{dE_2}{dy_2}*\frac{dy_2}{dOut\;h_1}=-0.06878889*0.23=-0.01582144}\]
So now putting it all together,

\[\mathbf{\frac{dE_{Total}}{dOut\;h_1}=\frac{dE_1}{dOut\;h_1}+\frac{dE_2}{dOut\;h_1}}= 0.06263492 - 0.01582144\\\\\\\\ \mathbf{\frac{dE_{Total}}{dOut\;h_1}=0.04681348}\]
Now coming back to the main equation,

\[\mathbf{\frac{dE_{Total}}{dw_1}=\frac{dE_{Total}}{dOut\;h_1}*\frac{dOut\;h_1}{dh_1}*\frac{dh_1}{dw_1}}\]
Now,

\[\mathbf{\frac{dOut\;h_1}{dh_1}=Outh_1 (1-Outh_1)}= 0.60290881(1 - 0.60290881)\] \[\mathbf{\frac{dOut\;h_1}{dh_1}=0.23940978}\]
Now,

\[\mathbf{\frac{dh_1}{dw_1}=\frac{d}{dw_1}(x_1 w_1+x_2 w_2+b_1 )=x_1=0.03}\]
So now finally putting it all together,

\[\mathbf{\frac{dE_{Total}}{dw_1}=\frac{dE_Total}{dOut\;h_1}*\frac{dOut\;h_1}{dh_1}*\frac{dh_1}{dw_1}}=\mathbf{0.04681348 * 0.23940978 * 0.03}\\\\ \mathbf{\frac{dE_{Total}}{dw_1}=0.00033623}\]

Update w1 with change value

After calculating how much changes required in weight (w1) to affect total error of our neural network model, to decrease error, we will subtract this change value () from the current weight (w1)

\[\mathbf{(new)w_1=w_1-\eta*\frac{dE_{Total}}{dw_1}\;\;[\eta\;is\;learning\;rate]}\\\\ =0.11 - 0.3 * \mathbf{0.00033623} \;\;[Taking\;\eta\; = 0.3]\\\\ So,\\\\ \mathbf{(new)w_1 = 0.10989913}\]
In this similar way we can calculate all new weights (w2, w3, w4) of output layer.

******* This is the end of 1st iteration of our Neural Network model *******

It is important to note that the model is not trained properly yet, as we only back-propagated through one sample (first row) from the training set. Doing all we did, all over again for all the samples (each row) will yield a complete model.

Neural Network Matrix Calculation in Python


While applying neural network you should not apply each steps (forward propagation, error calculation, back propagation) for entire dataset sample-by-sample (row by row). It will be then a time consuming process where as you need to repeat same thing for so many times (total number of row of the training dataset).

Instead of that we will calculate each steps of neural network at once (all row once) by doing matrix calculation.

Let me show you each step of neural network from scratch using numpy in Python.

Hidden Layer Matrix Calculation



\[\begin{pmatrix}h_1 \\ h_2 \end{pmatrix} = \begin{pmatrix}x_1 \\ x_2 \end{pmatrix} \begin{pmatrix} w_1 & w_2\\ w_3 & w_4 \end{pmatrix} = \begin{pmatrix} x_1w_1 +x_2w_2 +b_1\\ x_1w_3 +x_2w_4 +b_1 \end{pmatrix}\]
Denoting Φ (phi) as activation function (for this example sigmoid function).
\[\begin{pmatrix}Out\;h_1 \\ Out\;h_2 \end{pmatrix} = \begin{pmatrix}\phi (h_1) \\ \phi(h_2) \end{pmatrix} = \begin{pmatrix} \phi(x_1w_1 +x_2w_2 +b_1)\\ \phi(x_1w_3 +x_2w_4 +b_1) \end{pmatrix}\]

Output Layer Matrix Calculation


\[\begin{pmatrix} y_1\\ y_2 \end{pmatrix} = \begin{pmatrix} h_1\\h_2 \end{pmatrix} \begin{pmatrix} w_5 & w_6\\ w_7 & w_8 \end{pmatrix} =\begin{pmatrix} h_1 w_5+h_2 w_6+b_2\\ h_1 w_7+h_2 w_8+b_2 \end{pmatrix}\] \[\begin{pmatrix} Out\;y_1\\ Out\;y_2 \end{pmatrix} = \begin{pmatrix} \phi(y_1)\\\phi(y_2) \end{pmatrix} =\begin{pmatrix} \phi(h_1 w_5+h_2 w_6+b_2)\\ \phi(h_1 w_7+h_2 w_8+b_2) \end{pmatrix}\]

Error Calculation

Neural Network in Numpy Python



##########################################################################
# Neural Network from Scratch using numpy
##########################################################################

import numpy as np

# input data x variable
x_val = np.array([[0.03, 0.09],
                   [0.04, 0.10],
                   [0.05, 0.11],
                   [0.06, 0.12]])

# output data y variable
y_val = np.array([[0.01, 0.99], 
                    [0.99, 0.01], 
                    [0.01, 0.99], 
                    [0.99, 0.01]])

###############################################

# Initializing weights

# 1st layer Weights
w1 = 0.11
w2 = 0.27
w3 = 0.19
w4 = 0.52

# 2nd layer weights
w5 = 0.44
w6 = 0.48
w7 = 0.23
w8 = 0.29

# Bias
b1 = 0.39
b2 = 0.42

# Learning rate
eta = 0.3

# setting 100 iteration to tune our neural network algorithm
iteration = 100

# 1st layer weights matrix
weights_h1 = np.array([[w1], [w2]])
weights_h2 = np.array([[w3], [w4]])

# 2nd layer weights matrix
weights_y1 = np.array([[w5], [w6]])
weights_y2 = np.array([[w7], [w8]])

#####################  Forward Propagation ##########################

# Entire hidden layer weight matrix
weights_h = np.row_stack((weights_h1.T, weights_h2.T))

# Entire output layer weight matrix
weights_y = np.row_stack((weights_y1.T, weights_y2.T))

# Sigmoid Activation function ==> S(x) = 1/1+e^(-x)
def sigmoid(x, deriv=False):
    if deriv == True:
        return x * (1 - x)
    return 1 / (1 + np.exp(-x))

h = np.dot(x_val, weights_h.T) + b1

# Entire 1st layer output matrix
out_h = sigmoid(h)

y = np.dot(out_h, weights_y.T) + b2

# Entire 2nd layer output matrix
out_y = sigmoid(y)

#####################  Error Calculation ##########################

# E as E total
E_total = (np.square(y_val - out_y))/2

#####################  Back Propagation ##########################

# 1. Update 2nd layer weights with change value 111111111111111111

# (dE_Total)/(dOut y_1 )
dE_total_dout_y = -(y_val - out_y)

# (d Out y_1)/(dy_1 )
dout_y_dy = out_y * (1 - out_y)

# (dy_1)/(dw_5 )
dy_dw = out_h

# For each iteration
for iter in range(iteration):

    # Foreach row of input data update 2nd layer weight matrix
    for row in range(len(x_val)):
    # row = 0

        # (dE_Total)/(dw_5 ) = (dE_Total)/(dOut y_1 )*(dOut y_1)/(dy_1 )*(dy_1)/(dw_5 )
        dE_Total_dw5 = dE_total_dout_y[row][0] * round(dout_y_dy[row][0], 8) * dy_dw[0][0]
        dE_Total_dw5 = round(dE_Total_dw5, 8)

        # (dE_Total)/(dw_5 ) = (dE_Total)/(dOut y_1 )*(dOut y_1)/(dy_1 )*(dy_1)/(dw_5 )
        dE_Total_dw6 = dE_total_dout_y[row][0] * round(dout_y_dy[row][0], 8) * dy_dw[0][1]
        dE_Total_dw6 = round(dE_Total_dw6, 8)

        # (dE_Total)/(dw_5 ) = (dE_Total)/(dOut y_1 )*(dOut y_1)/(dy_1 )*(dy_1)/(dw_5 )
        dE_Total_dw7 = dE_total_dout_y[row][0] * round(dout_y_dy[row][1], 8) * dy_dw[0][0]
        dE_Total_dw7 = round(dE_Total_dw7, 8)

        # (dE_Total)/(dw_5 ) = (dE_Total)/(dOut y_1 )*(dOut y_1)/(dy_1 )*(dy_1)/(dw_5 )
        dE_Total_dw8 = dE_total_dout_y[row][0] * round(dout_y_dy[row][1], 8) * dy_dw[0][1]
        dE_Total_dw8 = round(dE_Total_dw8, 8)

        # Combine all differential weights
        dE_Total_dw_2nd_layer = np.array([[dE_Total_dw5, dE_Total_dw6],
                           [dE_Total_dw7, dE_Total_dw8]])

        # Updated weights for 2nd layer
        # (new)w_5 = w_5-η*(dE_Total)/(dw_5 )   [η is learning rate]
        weights_y = weights_y - (eta * dE_Total_dw_2nd_layer)
        weights_y

        # 2. Update 1st layer weights with change value 22222222222222222


        # (dE_2)/(dy_2 )=(dE_2)/(d〖Out y〗_2 )*(d〖Out y〗_2)/(dy_2 )
        dE_dy = -(y_val - out_y) * (out_y * (1-out_y))

        # (dE_1)/(dOut h_1 )= (dE_1)/(dy_1 )*(dy_1)/(dOut h_1 )
        dE_dOut_h1 = dE_dy * np.array([[w5, w7]])

        # (dE_2)/(dOut h_1 )= (dE_2)/(dy_2 )*(dy_2)/(dOut h_1 )
        dE_dOut_h2 = dE_dy * np.array([[w6, w8]])

        # (dE_Total)/(dOut h_1 )=(dE_1)/(dOut h_1 )+(dE_2)/(dOut h_1 )
        dE_Total_dOut_h1 = dE_dOut_h1[row][0] + dE_dOut_h1[row][1]

        # (dOut h_1)/(dh_1 )=Outh_1 (1-Outh_1)
        dOut_h_dh = out_h * (1-out_h)

        # dh1_dw1 = x
        dh_dw = x_val

        # (dE_Total)/(dw_1 )=(dE_Total)/(dOut h_1 )*(dOut h_1)/(dh_1 )*(dh_1)/(dw_1 )
        dE_Total_dw1 = dE_Total_dOut_h1 * dOut_h_dh[row][0] * dh_dw[row][0]
        dE_Total_dw1 = round(dE_Total_dw1, 8)

        # (dE_Total)/(dw_1 )=(dE_Total)/(dOut h_1 )*(dOut h_1)/(dh_1 )*(dh_1)/(dw_1 )
        dE_Total_dw2 = dE_Total_dOut_h1 * dOut_h_dh[row][0] * dh_dw[row][1]
        dE_Total_dw2 = round(dE_Total_dw2, 8)

        # (dE_Total)/(dw_1 )=(dE_Total)/(dOut h_1 )*(dOut h_1)/(dh_1 )*(dh_1)/(dw_1 )
        dE_Total_dw3 = dE_Total_dOut_h1 * dOut_h_dh[row][1] * dh_dw[row][0]
        dE_Total_dw3 = round(dE_Total_dw3, 8)

        # (dE_Total)/(dw_1 )=(dE_Total)/(dOut h_1 )*(dOut h_1)/(dh_1 )*(dh_1)/(dw_1 )
        dE_Total_dw4 = dE_Total_dOut_h1 * dOut_h_dh[row][1] * dh_dw[row][1]
        dE_Total_dw4 = round(dE_Total_dw4, 8)

        # Combine all differential weights
        dE_Total_dw_1st_layer = np.array([[dE_Total_dw1, dE_Total_dw2],
                           [dE_Total_dw3, dE_Total_dw4]])

        # update weights w1
        weights_h = weights_h - (eta * dE_Total_dw_1st_layer)
        
    print('iteration: ' + str(iter) + ' complete')


To ease to remember everything, let me list down all equations of Neural Network

Neural Network Equations

Forward Propagation

h1 = x1w1 + x2w2 +b1 

\[\mathbf{Out\;h_1=\frac{1}{1+e^{-h_1}}}\]
h2 = x1w3 + x2w4 +b1

\[\mathbf{Out\;h_2=\frac{1}{1+e^{-h_2}}}\]
y1 = Outh1*w5 + Outh2*w6 + b2
\[\mathbf{Out\;y_1=\frac{1}{1+e^{-y_1}}}\]
y2 = Outh1*w7 + Outh2*w8 + b2
\[\mathbf{Out\;y_2=\frac{1}{1+e^{-y_2}}}\]

Error Calculation


\[\mathbf{E_1 = \frac{1}{2}(T_1-Out\;y_1 )^2}\] \[\mathbf{E_2 = \frac{1}{2}(T_2-Out\;y_2 )^2}\]

Back Propagation

1.Update 2nd layer weights

\[\mathbf{\frac{dE_{Total}}{dw_5}=\frac{dE_{Total}}{dOut\;y_1}*\frac{dOut\;y_1}{dy_1}*\frac{dy_1}{dw_5}}\]
2.Update 1st layer weights

\[\mathbf{\frac{dE_{Total}}{dw_1}=\frac{dE_{Total}}{dOut\;h_1}*\frac{dOut\;h_1}{dh_1}*\frac{dh_1}{dw_1}}\]

Conclusion

In this tutorial I have shown just one epoch (one round of forward pass to get the predicted output value and one round of backward pass to update all weights (w1,w2,...w5,..w8). After updating again you need to calculate total error by doing forward pass.
After one round of iteration you may get better accuracy than initial stage of neural network. But after repeating this process (iteration) 1,000 or 10,000 times you can achieve predicted output near to actual target value.



Derivative of sigmoid function


\[\mathbf{\frac{d}{dx}e^{-x}=e^{-x}}\]

Explanation


\[\mathbf{\frac{d}{dx}e^{-x}=\frac{de^{-x}}{d(-x)}*\frac{d}{dx}(-x)=e^{-x}*(-1)=-e^{-x}}\]
If you have any question or suggestion regarding this topic see you in comment section. I will try my best to answer.