{{announcement.body}}
{{announcement.title}}

Coding the Recurrent Neural Network

DZone 's Guide to

Coding the Recurrent Neural Network

Look at a write-up that explains how an RNN works from the time of initialization of weights to forward path, back propagate, and update of the weight network.

· AI Zone ·
Free Resource

While I was referring to the following articles to learn the basics, I realized that I could add a few more things so that the community and developers like me can benefit

https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/

https://towardsdatascience.com/recurrent-neural-networks-rnns-3f06d7653a85

The below write-up explains in detail with code snippets, how a RNN works from the time of initialization of weights to forward path, back propagate and update of the weight network.

Let us take this example of a single hidden layer with 3 neurons processing the word ‘Hello’ (example taken from the first link above)

For this example, I am trying to keep the computation simple and calculating the values upto H2 only. It can be extended to H4 and further down and many hidden layers

Step1 – Weight Initialization:

  • Considering the input layer to be [[1,0,0,0],[0,1,0,0]] - for letters 'h' and 'e'
  • The same weight networks will be repeated for all the states in the same hidden layer. When we will add a new hidden layer, a new weight matrix will be added
  • Input weight network(Wxh) will be 3*5 matrix. 3 rows due to 3 neurons, 4 columns due to the 4 input values(one hot encoded: 1,0,0,0 for ‘h’, 0,1,0,0 for ‘e’….), 5th column is the bias weight
Java
 




xxxxxxxxxx
1


1
weight_arrayTest=[[0.287027,0.84606,0.572392,0.486813,0.56700],
2
                        [0.902874,0.871522,0.691079,0.18998,0.56700],
3
                        [0.537524,0.09224,0.558159,0.491528,0.56700]] 
4
weight_hidden_dictTest=dict()
5
        weight_hidden_dictTest.update({'Weight':weight_arrayTest})
6
        weight_network.append(weight_hidden_dictTest)
7
 
          



  • Recurrent weight network(Whh): [0.427043]. This is a 1*1 matrix for 1 hidden layer.
  • Output weight network (Wyh) will be a 4*3 matrix. 4 rows as the array size of the input array is 4(for each input element e.g x1[1,0,0,0] …., there has to be an output), 3 columns as there are 3 neurons and each neuron will emit a state)
Java
 




xxxxxxxxxx
1


 
1
output_weight_arrayTest=[[0.37168,0.974829459,0.830034886],
2
                        [0.39141,0.282585823,0.659835709],
3
                        [0.64985,0.09821557,0.334287084],
4
                        [0.91266,0.32581642,0.144630018]]
5
output_weight_hidden_dictTest=dict()
6
        output_weight_hidden_dictTest.update({'Weight':output_weight_arrayTest})
7
        output_weight_network.append(output_weight_hidden_dictTest)
8
 
          



Step 2 – Forward Path

The forward path is explained very well in the first link above through an excel based calculation. The code snippet below explains the logic

I have broken it down into multiple small parts so that it is easier to understand. The code can be simplified through dot product and array arithmetic

  • LOOP through the input Weight Network and multiply the weights with the input values(dot product)

('input weight Network', [{'Weight': [[0.287027, 0.84606, 0.572392, 0.486813, 0.567], [0.902874, 0.871522, 0.691079, 0.18998, 0.567], [0.537524, 0.09224, 0.558159, 0.491528, 0.567]]}])

inputs=[[1,0,0,0],

                [0,1,0,0]]

Python
 




xxxxxxxxxx
1
22


 
1
for k in weight_network:
2
            new_inputs=list()
3
            for j in k.values()[0]:
4
                aggregate = 0.0
5
                aggregate_list=list()
6
                for n in range(len(j[0:len(j)-1])):
7
                    aggregate += inputs[n]*j[0:len(j)-1][n]
8
                aggregate += j[-1]*1#for the bias
9
                if len(previous_states) != 0:#adding the previous states
10
                    aggregate_sum = aggregate + aggregate1_list[indices]
11
                    output = 1/(1+np.exp(-aggregate_sum))#activation
12
                    new_inputs.append(output)
13
                else:#activation
14
                    output = 1/(1+np.exp(-aggregate))
15
                    new_inputs.append(output)
16
                indices += 1
17
            #at each step, the recurrent network will also produce an output
18
            hidden_layer_output.append(new_inputs)
19
            inputs = new_inputs
20
        
21
        return [inputs,hidden_layer_output]
22
 
          



  • If the previous state is not zero, we will need to add the previous states to the aggregate that we are calculating from the dot product. Aggregate1_list is calculated through
Python
 




x


1
if len(previous_states) != 0:
2
            #for i in previous_states[sample_row_index-1]:
3
            for i in range(len(previous_states[sample_row_index-1])):
4
                print("Values of i",previous_states[sample_row_index-1][i])
5
                print("Values of recurrent neuron weight",recurrrent_neuron_weight[0])
6
                aggregate1 = previous_states[sample_row_index-1][i]*recurrrent_neuron_weight[0]
7
                aggregate1_list.append(aggregate1)
8
        print("Aggregate 1 list",aggregate1_list)



  • After the dot product and the previous states are added, we use the activation function(I used the sigmoid) to calculate the corresponding output(hidden state value for each neuron). Since, I have restricted the input layer/loop to ‘h’,’e’ of the word ‘hello’, the hidden state output after 2 iterations of the input array will look like the 2*3 matrix below(2 rows for the 2 input rows and 2 columns for the 3 neurons in each state for the 1 hidden layer)

 ('States output are', [[0.70141121474134871, 0.81303823389731422, 0.75110680687063036], [0.84717227340513512, 0.85640226112907092, 0.72710720520726202]])

  • Calculate the output value(y) at each step with the States Output and the output weight network
Python
 




xxxxxxxxxx
1


 
1
def outputCalc(self,new_inputs):
2
        output_arr=list()
3
        for k in output_weight_network:
4
            for j in k.values()[0]:
5
                aggregate = 0.0
6
                for n in range(len(j[0:len(j)])):
7
                    aggregate += new_inputs[n]*j[0:len(j)][n]
8
                output_arr.append(aggregate)
9
        return output_arr



The output data structure for 2 iterations of the input array loop will look like the 2*4 matrix below(2 rows for the 2 input rows and 4 columns for the 4 one hot encoded input elements for each X(t))

'Output Arr List', [[1.6767189948061862, 0.99989953446445878, 0.7867503957150177, 1.0136837569350066], [1.7532474896660379, 1.0533701355806655, 0.87770948548253291, 1.1573716940239656]])

  • Predict through SoftMax
Python
 




xxxxxxxxxx
1


 
1
def predictSoftMax(self,output_values):#softmax function
2
        print("Output values in predict method is",output_values)
3
        out = np.exp(output_values)
4
        print("Sigma of softmax is",out/np.sum(out))
5
        return out/np.sum(out)#the output of the array will add up to 1



('Predicted Softmax values', array([ 0.40578425,  0.20153121,  0.16906506,   0.22361947]))

Array size of 4 for 4 outputs(4 output values for each of 4 one hot encoded input elements)

  • Calculate the Performance (Cost Function) - -(1/2)(d-z)**2
Python
 




xxxxxxxxxx
1


 
1
def performance(self,outputs,sample_row_index):
2
        performance_errors=0.0
3
        for i in range(len(outputs[sample_row_index])):
4
            print("Actual", input_arr[sample_row_index][i])
5
            print("Predicted", outputs[sample_row_index][i])
6
            performance_errors += -(0.5)*(input_arr[sample_row_index][i]-outputs[sample_row_index][i])**2 # -(1/2)(d-z)**2
7
        return performance_errors



We will see the Error value decreasing with each loop/iteration

Step 3 – Back Propagation 

The math behind the backpropagation is shown in the picture below

Initialize the delta arrays for input weight matrix, output weight matrix and recurrent weight matrix. Using the standard notation U=input weight, V=output weight, W=recurrent weight

Python
 




xxxxxxxxxx
1


1
dV = np.zeros(shape=(4,3),dtype=float)
2
        #du_main_list=list()
3
        dU = np.zeros(shape=(no_of_neurons_hidden_layer,input_features+1),dtype=float)
4
        #dw_main_list=list()
5
        dW = np.zeros(shape=(no_of_hidden_layers),dtype=float)



traverse in the reverse order of the length of the network for the single hidden layer (in this example, the reverse order will be from 2 as I have stopped after step 2 to keep it simple

  • Step 1: calculate the delta weight matrix for the output weight network

The (d-z) is first calculated  and then fed into the second code snippet(dot product of (d-z).h(t).T)


Python
 




x



1
 def performance_derivative_output(self,outputs,sample_row_index):
2
        error_list=list()
3
        for i in range(len(outputs[sample_row_index])):
4
            print("Actual derivative function", input_arr[sample_row_index][i])
5
            print("Predicted derivative function", outputs[sample_row_index][i])
6
            print("Actual-Predicted Value derivative function",input_arr[sample_row_index][i]-outputs[sample_row_index][i])
7
            error_list.append(input_arr[sample_row_index][i]-outputs[sample_row_index][i])#(d-z)
8
        return error_list


Python
 




xxxxxxxxxx
1


 
1
dV += np.dot(np.array(output_array_list[i])[:,None],np.array(states_list[i])[None,:])



('Output Array List', [[-0.67671899480618625, -0.99989953446445878, -0.7867503957150177, -1.0136837569350066], [-1.7532474896660379, -0.053370135580665501, -0.87770948548253291, -1.1573716940239656]])

('States List', [[0.70141121474134871, 0.81303823389731422, 0.75110680687063036], [0.84717227340513512, 0.85640226112907092, 0.72710720520726202]])

After 2 loops, the final dV weight matrix will look like below (4*3 matrix size)

'dV', array([[-1.95996095, -2.05168353, -1.78308713],

       [-0.74655445, -0.85866286, -0.78983716],

       [-1.29540669, -1.39133054, -1.22912247],

        [-1.69150236, -1.81533939, -1.60291807]]))

  • Step 2: calculate [y(t) desired – y(t) actual)].w(yh). You can see the math in the picture above
Python
 




xxxxxxxxxx
1


 
1
dList1 = np.dot(output_weight_array_Transpose,np.array(output_array_list[i])[:,None])



('Output weight array Transpose', array([[ 0.37168   ,   0.39141   ,  0.64985    ,  0.91266   ],

       [ 0.97482946,  0.28258582,   0.09821557,  0.32581642],

       [ 0.83003489,  0.65983571,   0.33428708,  0.14463002]]))

The output_array_list is the output of performance_derivative_output() function in the previous step

 After 2 loops, the output of the above statement will generate the following array

('dList 1 current', array([[-2.07931196],

       [-1.349789  ],

        [-1.63107939]]))

  • Step 3: Add the carry forward to dList1. The carry forward equation is given in the picture above. The code is given in Step 6
Python
 




xxxxxxxxxx
1


 
1
dList1 = dList1 + delta_error[:,None]



The updated array after the second run of the loop (which is state 1)

('Updated Delta 1 list', array([[-2.52200365],

       [-1.77097217],

       [-2.00677954]]))

  • Step 4: calculate ([y(t) desired – y(t) actual)].w(yh)+delta_error)*[h(t)(1-h(t))]
Python
 




xxxxxxxxxx
1


 
1
updated_states_list = self.calculateSigmoidBackProp(states_list[i])#[h(t)(1-h(t))
2
  


Python
 




x


1
def calculateSigmoidBackProp(self,states_list):#[h(t)(1-h(t))
2
        updated_states_list=list()
3
        for i in range(len(states_list)):
4
            print('states_list[i]',states_list[i])
5
            updated_states_list.append(states_list[i]*(1-states_list[i]))
6
        return updated_states_list


Python
 




xxxxxxxxxx
1


 
1
dList2=list()#([y(t) desired – y(t) actual)].w(yh)+delta_error)*h(t)(1-h(t)
2
            for i1 in range(len(dList1)):
3
                sum=0
4
                for j in range(len(updated_states_list)):
5
                    print("dList 1 elements",dList1[i1])
6
                    print("updated_states_list[j] elements",updated_states_list[j])
7
                    sum += dList1[i1][0]*updated_states_list[j]
8
                print("Individual sum",sum)
9
                dList2.append(sum)



  • Step 5: Take the output of Step 4 and do a dot product with the Input
Python
 




xxxxxxxxxx
1
10


 
1
input_arr[i].append(1)#append 1 for the input bias
2
            dUMain_list = list()
3
            for m in np.array(dList2):
4
                sub_list = list()
5
                for m1 in input_arr[i]:
6
                    sub_list.append(m*m1)
7
                dUMain_list.append(sub_list)    
8
            print("Input dUMain_list",dUMain_list)
9
            
10
dU += np.array(dUMain_list)



dU after 2 loops (matrix of 3*5)

('Du Main List', array([[-1.38303139, -1.03664432,   0.        ,  0.         , -2.41967571],

       [-0.97117627, -0.98627813,  0.         ,  0.        , -1.9574544 ],

        [-1.10048972, -0.87977125,   0.        ,   0.        , -1.98026097]]))

  • Step 6: Calculate the carry forward that was added in Step 3:
Python
 




xxxxxxxxxx
1


1
delta_error = np.dot(np.array(dList2)[:,None], np.array(recurrrent_neuron_weight))



  • Step 7: calculate dW: Take the output of Step 4 and do a dot product with the h(t-1)
Python
 




x


 
1
if i != 0:#stop if we are at h(1) and h(0) is zero
2
                print("dList2",np.array(dList2)[None,:])
3
                print("sates list[i-1]",np.array(states_list[i-1])[:,None])
4
                dW_Sum=0
5
                for w in range(len(dList2)):
6
                    print("MUltiplying {0} * {1}".format(dList2[w],states_list[i-1][w]))
7
                    dW_Sum+=dList2[w]*states_list[i-1][w]
8
                
9
                print("dW_Sum",dW_Sum)
10
                dW += dW_Sum
11
            print("DW Main List", dW)



dW after 2 loops (matrix of 1*1)

('DW Main List', array([-2.18979795]))

Step 4 – Update Weights

We now have the Delta weight matrix for U,V,W. While calculating, we will have to make sure that the dimensions of dV,dU and dW are same as the initialized matrix

('dV before weight update', array([[-1.95996095, -2.05168353, -1.78308713],

       [-0.74655445, -0.85866286, -0.78983716],

       [-1.29540669, -1.39133054, -1.22912247],

       [-1.69150236, -1.81533939, -1.60291807]]))

('dU before weight update', array([[-1.38303139, -1.03664432,  0.        ,   0.        , -2.41967571],

       [-0.97117627, -0.98627813,  0.        ,   0.        , -1.9574544 ],

       [-1.10048972, -0.87977125,  0.        ,   0.        , -1.98026097]]))

('dW before weight update', array([-2.18979795]))

The dimensions match the input weight networks. The updated weight matrix values can be derived by just adding up the input and the delta values

For the output_weight_network[0] and weight_network[0], Index 0 is hardcoded as we have only 1 hidden layer for this example. With multiple hidden layers, the index will be fetched via a loop

Python
 




xxxxxxxxxx
1


1
new_output_weights =  list(output_weight_network[0].get('Weight')) + dV
2
 
          
3
new_input_weights =  list(weight_network[0].get('Weight')) + dU
4
 
          
5
new_recurrent_weights =  recurrrent_neuron_weight + dW
6
 
          



Topics:
ai artificial intelligence, machine learning algorithms, recurrent neiral network, rnn

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}