Intro and Downloading data

Welcome to this exciting journey aboard Titanic Spaceship!

This is tutorial is almost a complete copy of Jeremy Howard’s excellent Linear model and neural net from scratch kaggle notebook which was part of the 2022 version of Deep Learning for Coders course by FastAI.

This also borrows heavily from Spaceship Titanic: A complete guide notebook by Samuel Cortinhas which is based on the dataset I’m gonna use to build by neural network.

Lastly and very importantly, Neural Network from Scratch series by sentdex youtuber channel’s Harrison and Daniel was instrumental in broadening my understanding. I have burrowed their way of coding neural network layers as classes. You can access their video and the book here

My explanations/comments are going to be minimal, because everything is explained quite well in the resources mentioned above. For neural net foundations(which is the primary aim of this blog/notebook). please refer to course.fast.ai by Jeremy & Co. and the nnfs.io series.

Let’s go conquer a neural network then!

Description of the features of the dataset, copied from the competition page:

import numpy as np
import pandas as pd
import torch
import kaggle
import os
from pathlib import Path

Downloading the data from Titanic-spaceship competition via the kaggle API:

path = Path('spaceship-titanic')
if not path.exists():
    import zipfile,kaggle
    kaggle.api.competition_download_cli(str(path))
    zipfile.ZipFile(f'{path}.zip').extractall(path)

Setting display options for numpy, pandas and pytorch to widen the output frames:

np.set_printoptions(linewidth=140)
torch.set_printoptions(linewidth=140, sci_mode=False, edgeitems=7)
pd.set_option('display.width', 140)

Cleaning the data

Looking at some samples from the dataset:

df = pd.read_csv(path/'train.csv')
df.head()
PassengerId HomePlanet CryoSleep Cabin Destination Age VIP RoomService FoodCourt ShoppingMall Spa VRDeck Name Transported
0 0001_01 Europa False B/0/P TRAPPIST-1e 39.0 False 0.0 0.0 0.0 0.0 0.0 Maham Ofracculy False
1 0002_01 Earth False F/0/S TRAPPIST-1e 24.0 False 109.0 9.0 25.0 549.0 44.0 Juanna Vines True
2 0003_01 Europa False A/0/S TRAPPIST-1e 58.0 True 43.0 3576.0 0.0 6715.0 49.0 Altark Susent False
3 0003_02 Europa False A/0/S TRAPPIST-1e 33.0 False 0.0 1283.0 371.0 3329.0 193.0 Solam Susent False
4 0004_01 Earth False F/1/S TRAPPIST-1e 16.0 False 303.0 70.0 151.0 565.0 2.0 Willy Santantines True

Exploring missing values and filling them with the Mode of the respective column:

df.isna().sum()
PassengerId       0
HomePlanet      201
CryoSleep       217
Cabin           199
Destination     182
Age             179
VIP             203
RoomService     181
FoodCourt       183
ShoppingMall    208
Spa             183
VRDeck          188
Name            200
Transported       0
dtype: int64
modes = df.mode().iloc[0]
modes
PassengerId                0001_01
HomePlanet                   Earth
CryoSleep                    False
Cabin                      G/734/S
Destination            TRAPPIST-1e
Age                           24.0
VIP                          False
RoomService                    0.0
FoodCourt                      0.0
ShoppingMall                   0.0
Spa                            0.0
VRDeck                         0.0
Name            Alraium Disivering
Transported                   True
Name: 0, dtype: object
df.fillna(modes, inplace=True)
df.isna().sum()
PassengerId     0
HomePlanet      0
CryoSleep       0
Cabin           0
Destination     0
Age             0
VIP             0
RoomService     0
FoodCourt       0
ShoppingMall    0
Spa             0
VRDeck          0
Name            0
Transported     0
dtype: int64

Exploratory Data Analysis & Feature Engineering

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
plt.style.use('ggplot')
# Figure size
plt.figure(figsize=(6,6))

# Pie plot
df['Transported'].value_counts().plot.pie(explode=[0.03,0.03], autopct='%1.1f%%', shadow=True, textprops={'fontsize':16}).set_title("Target distribution")
Text(0.5, 1.0, 'Target distribution')

png

Target varaible is quite balanced, hence we don’t need to perform over/undersampling.

Now let’s describe categorical features:

df.describe(include=[object])
PassengerId HomePlanet Cabin Destination Name
count 8693 8693 8693 8693 8693
unique 8693 3 6560 3 8473
top 0001_01 Earth G/734/S TRAPPIST-1e Alraium Disivering
freq 1 4803 207 6097 202

Let’s now replace the strings in these categorical features by numbers. Pandas offers a get_dummies method to convert these to numbers so that we can multiply them with weights. It’s basically one-hot coding, letting the model know the unqiue levels available in a particular.

We only process HomePlanet and Destination via get_dummiesbecause others simply have too many unique values (aka levels).

df = pd.get_dummies(df, columns=["HomePlanet", "Destination"])
df.columns
Index(['PassengerId', 'CryoSleep', 'Cabin', 'Age', 'VIP', 'RoomService', 'FoodCourt', 'ShoppingMall', 'Spa', 'VRDeck', 'Name',
       'Transported', 'HomePlanet_Earth', 'HomePlanet_Europa', 'HomePlanet_Mars', 'Destination_55 Cancri e', 'Destination_PSO J318.5-22',
       'Destination_TRAPPIST-1e'],
      dtype='object')

Our dummy columns are visible at the end of the dataframe!

Looking at numerical features:

df.describe(include=(np.number))
Age RoomService FoodCourt ShoppingMall Spa VRDeck HomePlanet_Earth HomePlanet_Europa HomePlanet_Mars Destination_55 Cancri e Destination_PSO J318.5-22 Destination_TRAPPIST-1e
count 8693.000000 8693.000000 8693.000000 8693.000000 8693.000000 8693.000000 8693.000000 8693.000000 8693.000000 8693.000000 8693.000000 8693.000000
mean 28.728517 220.009318 448.434027 169.572300 304.588865 298.261820 0.552514 0.245140 0.202347 0.207063 0.091568 0.701369
std 14.355438 660.519050 1595.790627 598.007164 1125.562559 1134.126417 0.497263 0.430195 0.401772 0.405224 0.288432 0.457684
min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 20.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
50% 27.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000
75% 37.000000 41.000000 61.000000 22.000000 53.000000 40.000000 1.000000 0.000000 0.000000 0.000000 0.000000 1.000000
max 79.000000 14327.000000 29813.000000 23492.000000 22408.000000 24133.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000

Samuel’s notebook uncovered the following useful insight regarding Age. Let’s visualize the feature first:

plt.figure(figsize=(10,4))

# Histogram
sns.histplot(data=df, x='Age', hue='Transported', binwidth=1, kde=True)

# Aesthetics
plt.title('Age distribution')
plt.xlabel('Age (years)')
Text(0.5, 0, 'Age (years)')

png

Notes and insights by Samuel:

Notes:

Insight:

p_groups = ['child', 'young', 'adult']

df['age_group'] = np.nan


    
        
df.loc[df['Age'] < 18, 'age_group'] = p_groups[0]
df.loc[(df['Age'] >= 18) & (df['Age'] <= 25), 'age_group'] = p_groups[1]
df.loc[df['Age'] > 25, 'age_group'] = p_groups[2]
df.head()
PassengerId CryoSleep Cabin Age VIP RoomService FoodCourt ShoppingMall Spa VRDeck Name Transported HomePlanet_Earth HomePlanet_Europa HomePlanet_Mars Destination_55 Cancri e Destination_PSO J318.5-22 Destination_TRAPPIST-1e age_group
0 0001_01 False B/0/P 39.0 False 0.0 0.0 0.0 0.0 0.0 Maham Ofracculy False 0 1 0 0 0 1 adult
1 0002_01 False F/0/S 24.0 False 109.0 9.0 25.0 549.0 44.0 Juanna Vines True 1 0 0 0 0 1 young
2 0003_01 False A/0/S 58.0 True 43.0 3576.0 0.0 6715.0 49.0 Altark Susent False 0 1 0 0 0 1 adult
3 0003_02 False A/0/S 33.0 False 0.0 1283.0 371.0 3329.0 193.0 Solam Susent False 0 1 0 0 0 1 adult
4 0004_01 False F/1/S 16.0 False 303.0 70.0 151.0 565.0 2.0 Willy Santantines True 1 0 0 0 0 1 child

Now we need dummies for age_group as well:

df = pd.get_dummies(df, columns=["age_group"])
df.head()
PassengerId CryoSleep Cabin Age VIP RoomService FoodCourt ShoppingMall Spa VRDeck ... Transported HomePlanet_Earth HomePlanet_Europa HomePlanet_Mars Destination_55 Cancri e Destination_PSO J318.5-22 Destination_TRAPPIST-1e age_group_adult age_group_child age_group_young
0 0001_01 False B/0/P 39.0 False 0.0 0.0 0.0 0.0 0.0 ... False 0 1 0 0 0 1 1 0 0
1 0002_01 False F/0/S 24.0 False 109.0 9.0 25.0 549.0 44.0 ... True 1 0 0 0 0 1 0 0 1
2 0003_01 False A/0/S 58.0 True 43.0 3576.0 0.0 6715.0 49.0 ... False 0 1 0 0 0 1 1 0 0
3 0003_02 False A/0/S 33.0 False 0.0 1283.0 371.0 3329.0 193.0 ... False 0 1 0 0 0 1 1 0 0
4 0004_01 False F/1/S 16.0 False 303.0 70.0 151.0 565.0 2.0 ... True 1 0 0 0 0 1 0 1 0

5 rows × 21 columns

# Expenditure features
exp_feats=['RoomService', 'FoodCourt', 'ShoppingMall', 'Spa', 'VRDeck']

# Plot expenditure features
fig=plt.figure(figsize=(10,20))
for i, var_name in enumerate(exp_feats):
    # Left plot
    ax=fig.add_subplot(5,2,2*i+1)
    sns.histplot(data=df, x=var_name, axes=ax, bins=30, kde=False, hue='Transported')
    ax.set_title(var_name)
    
    # Right plot (truncated)
    ax=fig.add_subplot(5,2,2*i+2)
    sns.histplot(data=df, x=var_name, axes=ax, bins=30, kde=True, hue='Transported')
    plt.ylim([0,100])
    ax.set_title(var_name)
fig.tight_layout()  # Improves appearance a bit
plt.show()

png

Insights:

for f in exp_feats:
    df[f'log{f}'] = np.log(df[f]+1)
df.head()
PassengerId CryoSleep Cabin Age VIP RoomService FoodCourt ShoppingMall Spa VRDeck ... Destination_PSO J318.5-22 Destination_TRAPPIST-1e age_group_adult age_group_child age_group_young logRoomService logFoodCourt logShoppingMall logSpa logVRDeck
0 0001_01 False B/0/P 39.0 False 0.0 0.0 0.0 0.0 0.0 ... 0 1 1 0 0 0.000000 0.000000 0.000000 0.000000 0.000000
1 0002_01 False F/0/S 24.0 False 109.0 9.0 25.0 549.0 44.0 ... 0 1 0 0 1 4.700480 2.302585 3.258097 6.309918 3.806662
2 0003_01 False A/0/S 58.0 True 43.0 3576.0 0.0 6715.0 49.0 ... 0 1 1 0 0 3.784190 8.182280 0.000000 8.812248 3.912023
3 0003_02 False A/0/S 33.0 False 0.0 1283.0 371.0 3329.0 193.0 ... 0 1 1 0 0 0.000000 7.157735 5.918894 8.110728 5.267858
4 0004_01 False F/1/S 16.0 False 303.0 70.0 151.0 565.0 2.0 ... 0 1 0 1 0 5.717028 4.262680 5.023881 6.338594 1.098612

5 rows × 26 columns

Finally, we are splitting PassengerId as per its data description:

# New feature - Group
df['Group'] = df['PassengerId'].apply(lambda x: x.split('_')[0]).astype(int)
df.columns
Index(['PassengerId', 'CryoSleep', 'Cabin', 'Age', 'VIP', 'RoomService', 'FoodCourt', 'ShoppingMall', 'Spa', 'VRDeck', 'Name',
       'Transported', 'HomePlanet_Earth', 'HomePlanet_Europa', 'HomePlanet_Mars', 'Destination_55 Cancri e', 'Destination_PSO J318.5-22',
       'Destination_TRAPPIST-1e', 'age_group_adult', 'age_group_child', 'age_group_young', 'logRoomService', 'logFoodCourt',
       'logShoppingMall', 'logSpa', 'logVRDeck', 'Group'],
      dtype='object')
added_cols = ['HomePlanet_Earth', 'HomePlanet_Europa','HomePlanet_Mars', 'Destination_55 Cancri e', 'Destination_PSO J318.5-22', 'Destination_TRAPPIST-1e','age_group_adult', 'age_group_child', 'age_group_young', 'logRoomService', 'logFoodCourt',
       'logShoppingMall', 'logSpa', 'logVRDeck', 'Group']
df[added_cols].head()
HomePlanet_Earth HomePlanet_Europa HomePlanet_Mars Destination_55 Cancri e Destination_PSO J318.5-22 Destination_TRAPPIST-1e age_group_adult age_group_child age_group_young logRoomService logFoodCourt logShoppingMall logSpa logVRDeck Group
0 0 1 0 0 0 1 1 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 1
1 1 0 0 0 0 1 0 0 1 4.700480 2.302585 3.258097 6.309918 3.806662 2
2 0 1 0 0 0 1 1 0 0 3.784190 8.182280 0.000000 8.812248 3.912023 3
3 0 1 0 0 0 1 1 0 0 0.000000 7.157735 5.918894 8.110728 5.267858 3
4 1 0 0 0 0 1 0 1 0 5.717028 4.262680 5.023881 6.338594 1.098612 4

What about CryoSleep?

df.CryoSleep.unique()
array([False,  True])

It’s boolean, so we can mulitply with weights. VIP looks the same.

indep_cols = ['Age', 'CryoSleep', 'VIP'] + added_cols

Setting up a linear model

Single layer neural network with one neuron:

np.random.seed(442)
weights = np.random.randn(len(indep_cols), 1)
bias = np.random.randn(1)
preds = np.dot(df[indep_cols].values, weights)
preds.shape
(8693, 1)
trn_indep = np.array(df[indep_cols], dtype='float32')
trn_dep = np.array(df['Transported'], dtype='float32')
print(trn_indep.shape)
print(trn_dep.shape)
(8693, 18)
(8693,)

Single layer neural network with three neurons:

np.random.seed(442)
weights = np.random.randn(len(indep_cols), 3) #three neurons in this layer
bias = np.random.randn(1, 3)
bias
array([[ 0.01176998, -0.85427749, -0.99987562]])
preds = np.dot(df[indep_cols].values, weights) + bias
preds.shape
(8693, 3)
preds[:10]
array([[-48.06963390253141, -18.290557540354897, 25.89840533681499],
       [-37.91377002769904, -7.504076537856783, 19.929613327882404],
       [-77.52103870477566, -16.739210482170467, 48.30474706150288],
       [-49.956656061980745, -2.8342719146155777, 27.74823775227435],
       [-27.921019655098725, -0.8518280063674349, 7.996057055720733],
       [-58.52447309612352, -11.829816864262263, 30.01637869888577],
       [-43.57861207919503, -3.600338170392331, 14.143030534080323],
       [-42.93381590139605, -11.247805747916502, 13.576656755604825],
       [-53.425588340898884, -4.639261738916223, 17.59311661596172],
       [-25.380443498116176, -10.613520203166267, 1.9029395296299394]], dtype=object)

A deeper neural network with 2 hidden layers and an output layer:

trn_dep = torch.tensor(trn_dep, dtype=torch.long)
trn_indep = torch.tensor(trn_indep, dtype=torch.float)
trn_indep.shape
torch.Size([8693, 18])
trn_dep.shape
torch.Size([8693])
#collapse_output
trn_indep[:5]

tensor([[39.0000,  0.0000,  0.0000,  0.0000,  1.0000,  0.0000,  0.0000,  0.0000,  1.0000,  1.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000,  0.0000,  1.0000],
        [24.0000,  0.0000,  0.0000,  1.0000,  0.0000,  0.0000,  0.0000,  0.0000,  1.0000,  0.0000,  0.0000,  1.0000,  4.7005,  2.3026,
          3.2581,  6.3099,  3.8067,  2.0000],
        [58.0000,  0.0000,  1.0000,  0.0000,  1.0000,  0.0000,  0.0000,  0.0000,  1.0000,  1.0000,  0.0000,  0.0000,  3.7842,  8.1823,
          0.0000,  8.8122,  3.9120,  3.0000],
        [33.0000,  0.0000,  0.0000,  0.0000,  1.0000,  0.0000,  0.0000,  0.0000,  1.0000,  1.0000,  0.0000,  0.0000,  0.0000,  7.1577,
          5.9189,  8.1107,  5.2679,  3.0000],
        [16.0000,  0.0000,  0.0000,  1.0000,  0.0000,  0.0000,  0.0000,  0.0000,  1.0000,  0.0000,  1.0000,  0.0000,  5.7170,  4.2627,
          5.0239,  6.3386,  1.0986,  4.0000]])
vals,indices = trn_indep.max(dim=0)
trn_indep = trn_indep / vals

Next let’s define our layers, I’m going to define them as classes, so that we can reuse objects of each classes when needed.

#linear layer class
class linearlayer:
    def __init__(self, n_inputs, n_neurons):
        self.weights = ((torch.rand(n_inputs, n_neurons)-0.3)/n_neurons)*4
        self.weights = self.weights.requires_grad_()
        self.biases = ((torch.rand(1, n_neurons))-0.5)*0.1
        self.biases = self.biases.requires_grad_()
    def forward(self, inputs):
        self.output = inputs@self.weights + self.biases
    
        
        
#ReLU activation function
class ReLU_act:
    def forward(self, inputs):
        self.output = torch.clip(inputs, 0.)
#Softmax - for the last layer
class Softmax_act:
    def forward(self, inputs):
        exp_values = torch.exp(inputs)
        probs = exp_values/torch.sum(exp_values, axis=1, keepdims=True)
        self.output = probs
n_inputs=len(indep_cols)
n_hidden=10
inputs=trn_indep
y_true=trn_dep
n_inputs
18

Initializing the parameters of our network:

    #collapse_output
    layer1 = linearlayer(n_inputs, n_hidden)
    relu1 = ReLU_act()
    layer2 = linearlayer(n_hidden, n_hidden)
    relu2 = ReLU_act()
    layer3 = linearlayer(n_hidden, 2)
    wandbs = layer1.weights, layer2.weights, layer3.weights, layer1.biases, layer2.biases, layer3.biases
    print('These are our weights and biases:')
    print(wandbs)
    layer1.forward(inputs)
    print("\n")
    print('These are our l1 outputs:')
    print(layer1.output)
    relu1.forward(layer1.output)
    print("\n")
    print('These are our r1 outputs:')
    print(relu1.output)
    layer2.forward(relu1.output)
    print("\n")
    print('These are our l2 outputs:')
    print(layer2.output)
    relu2.forward(layer2.output)
    print("\n")
    print('These are our r2 outputs:')
    print(relu2.output)
    layer3.forward(relu2.output)
    print("\n")
    print('These are our l3 outputs:')
    print(layer3.output)
    softmax = Softmax_act()
    softmax.forward(layer3.output)
    
    print("\n")
    print('These are our softmax outputs:')
    print(softmax.output)
    
    
These are our weights and biases:
(tensor([[    -0.0002,      0.2331,      0.0117,      0.2600,      0.2774,      0.1307,      0.1950,      0.2057,      0.1399,     -0.0986],
        [     0.1946,     -0.0915,      0.0643,      0.0087,      0.2372,      0.2070,      0.2365,      0.2561,      0.2278,     -0.0841],
        [     0.0705,     -0.1123,     -0.0432,      0.2374,      0.1913,      0.1934,      0.0814,     -0.0839,     -0.1167,      0.2246],
        [     0.0705,      0.1724,      0.1248,      0.0990,      0.1931,     -0.0708,     -0.0022,      0.2181,      0.0299,      0.1734],
        [     0.1153,     -0.1066,     -0.0893,      0.1900,      0.2777,      0.1359,      0.1473,      0.1143,      0.2712,     -0.0556],
        [     0.1509,      0.1151,      0.0681,     -0.0028,      0.2353,      0.1948,      0.0870,      0.0006,     -0.0163,      0.0389],
        [    -0.0553,      0.2512,      0.0427,      0.2267,     -0.0093,      0.1136,     -0.1013,      0.1188,     -0.0013,      0.1935],
        [     0.1943,      0.2582,      0.1152,      0.2282,      0.1577,      0.0751,      0.0589,      0.0211,     -0.0506,      0.1875],
        [     0.0339,      0.1409,      0.0418,      0.1518,      0.0863,      0.0752,     -0.0287,      0.0064,     -0.0014,      0.0103],
        [    -0.1056,      0.2283,      0.1243,      0.1812,      0.1591,      0.1607,      0.1488,     -0.0419,     -0.0072,     -0.0633],
        [    -0.1127,     -0.0293,      0.1272,      0.0205,      0.0115,      0.2260,     -0.0251,      0.2198,     -0.0274,      0.0513],
        [     0.0951,      0.0732,      0.2631,      0.1341,     -0.0821,      0.0232,      0.1464,      0.0473,      0.1390,     -0.0458],
        [     0.1315,      0.0439,      0.1412,     -0.0664,      0.0940,      0.1332,     -0.0286,      0.0506,      0.0248,      0.0755],
        [     0.2530,      0.2420,      0.2765,      0.2048,     -0.0882,     -0.0035,      0.1577,      0.0605,      0.0726,      0.2733],
        [     0.2079,      0.0583,      0.0246,      0.1822,     -0.0031,      0.1719,     -0.0781,      0.2127,      0.2108,      0.0760],
        [    -0.0597,     -0.0759,      0.1393,      0.2165,      0.2028,      0.0627,     -0.0170,     -0.0651,      0.0579,      0.0773],
        [     0.1473,     -0.0495,      0.2317,      0.0157,     -0.1144,     -0.0419,      0.1788,      0.2388,      0.1428,      0.0085],
        [     0.0061,      0.2621,      0.2392,      0.0263,     -0.1017,     -0.0018,      0.1258,      0.1266,      0.1345,     -0.0650]],
       requires_grad=True), tensor([[-0.0621, -0.0691,  0.2171, -0.0623,  0.1145,  0.0525,  0.2408,  0.2170,  0.1916,  0.0629],
        [-0.0341, -0.0009, -0.0881,  0.1173,  0.0511, -0.0289,  0.1675,  0.2559,  0.0796, -0.0451],
        [ 0.1161,  0.0154, -0.0154, -0.0897,  0.1087, -0.0746, -0.0648, -0.0229,  0.1991,  0.2154],
        [ 0.0914,  0.0735, -0.0821, -0.0773,  0.0863,  0.0284, -0.0972,  0.1577,  0.1829, -0.0549],
        [ 0.0412,  0.1230,  0.2663,  0.0677,  0.2552, -0.0593,  0.0626,  0.1118, -0.0184,  0.1215],
        [-0.0723,  0.1951,  0.0814,  0.1907, -0.0819,  0.0893,  0.2131, -0.0612,  0.0357,  0.0940],
        [ 0.2410,  0.0710, -0.0211,  0.1043,  0.0940,  0.0236, -0.1108, -0.0611,  0.2035, -0.0306],
        [ 0.1431,  0.2022,  0.1448, -0.1078, -0.0829,  0.1917,  0.2769,  0.1859,  0.0531,  0.1962],
        [ 0.0767,  0.2175,  0.1521, -0.0540,  0.0504,  0.2353,  0.2760, -0.1082,  0.1134,  0.2455],
        [ 0.2477,  0.2034,  0.1353,  0.0091, -0.0367,  0.2536,  0.0904,  0.1440, -0.1072, -0.0786]], requires_grad=True), tensor([[ 0.6042,  0.9942],
        [ 1.1100,  0.6981],
        [ 0.2269, -0.0373],
        [ 0.8578,  0.3621],
        [ 1.1303,  0.1530],
        [ 1.2218,  0.6881],
        [ 0.4251,  0.7249],
        [-0.3872,  0.6244],
        [ 1.3207, -0.3660],
        [ 0.7619,  0.8244]], requires_grad=True), tensor([[ 0.0031,  0.0179, -0.0161, -0.0428, -0.0240, -0.0101, -0.0082,  0.0291,  0.0329, -0.0085]], requires_grad=True), tensor([[-0.0072,  0.0279, -0.0105, -0.0237, -0.0117,  0.0383,  0.0107, -0.0042,  0.0435,  0.0144]], requires_grad=True), tensor([[-0.0298, -0.0055]], requires_grad=True))


These are our l1 outputs:
tensor([[ 0.0465,  0.3955,  0.0665,  0.6085,  0.6360,  0.4262,  0.3555,  0.2094,  0.3645, -0.1657],
        [ 0.4090,  0.5032,  0.7315,  0.6355,  0.3676,  0.2011,  0.2193,  0.5196,  0.4299,  0.2742],
        [ 0.3745,  0.4630,  0.5139,  1.2414,  0.9952,  0.7398,  0.6521,  0.2783,  0.4553,  0.3533],
        [ 0.3732,  0.4930,  0.5060,  1.0217,  0.6563,  0.5438,  0.4839,  0.4328,  0.6498,  0.1433],
        [ 0.2600,  0.4513,  0.6046,  0.5558,  0.4570,  0.4457, -0.0068,  0.6613,  0.2647,  0.4525],
        [ 0.2802,  0.9090,  0.5997,  0.8559,  0.7023,  0.2610,  0.3909,  0.3403,  0.1594,  0.4420],
        [ 0.2624,  0.8340,  0.5346,  0.6196,  0.4794,  0.2714,  0.2643,  0.3717,  0.1908,  0.3143],
        ...,
        [ 0.2636,  0.7582,  0.6401,  0.8374,  0.4956,  0.4299,  0.6259,  0.4212,  0.6110,  0.0037],
        [ 0.3290,  0.7838,  0.6694,  0.7983,  0.3901,  0.3895,  0.6834,  0.4900,  0.6283,  0.0081],
        [ 0.2696,  0.7915,  0.7026,  1.2959,  0.6626,  0.6849,  0.6939,  0.4754,  0.5521,  0.4695],
        [ 0.5637,  0.7453,  0.7932,  0.5127,  0.4434,  0.2524,  0.6015,  0.7450,  0.5453,  0.1351],
        [ 0.1594,  0.9366,  0.5459,  0.6524,  0.4158,  0.3292,  0.2400,  0.5606,  0.3965,  0.0767],
        [ 0.2173,  0.8266,  0.7594,  0.9643,  0.3819,  0.4520,  0.6310,  0.6241,  0.6840,  0.1978],
        [ 0.3641,  0.8806,  0.6636,  0.7896,  0.4980,  0.4865,  0.6539,  0.4849,  0.6162,  0.0276]], grad_fn=<AddBackward0>)


These are our r1 outputs:
tensor([[0.0465, 0.3955, 0.0665, 0.6085, 0.6360, 0.4262, 0.3555, 0.2094, 0.3645, 0.0000],
        [0.4090, 0.5032, 0.7315, 0.6355, 0.3676, 0.2011, 0.2193, 0.5196, 0.4299, 0.2742],
        [0.3745, 0.4630, 0.5139, 1.2414, 0.9952, 0.7398, 0.6521, 0.2783, 0.4553, 0.3533],
        [0.3732, 0.4930, 0.5060, 1.0217, 0.6563, 0.5438, 0.4839, 0.4328, 0.6498, 0.1433],
        [0.2600, 0.4513, 0.6046, 0.5558, 0.4570, 0.4457, 0.0000, 0.6613, 0.2647, 0.4525],
        [0.2802, 0.9090, 0.5997, 0.8559, 0.7023, 0.2610, 0.3909, 0.3403, 0.1594, 0.4420],
        [0.2624, 0.8340, 0.5346, 0.6196, 0.4794, 0.2714, 0.2643, 0.3717, 0.1908, 0.3143],
        ...,
        [0.2636, 0.7582, 0.6401, 0.8374, 0.4956, 0.4299, 0.6259, 0.4212, 0.6110, 0.0037],
        [0.3290, 0.7838, 0.6694, 0.7983, 0.3901, 0.3895, 0.6834, 0.4900, 0.6283, 0.0081],
        [0.2696, 0.7915, 0.7026, 1.2959, 0.6626, 0.6849, 0.6939, 0.4754, 0.5521, 0.4695],
        [0.5637, 0.7453, 0.7932, 0.5127, 0.4434, 0.2524, 0.6015, 0.7450, 0.5453, 0.1351],
        [0.1594, 0.9366, 0.5459, 0.6524, 0.4158, 0.3292, 0.2400, 0.5606, 0.3965, 0.0767],
        [0.2173, 0.8266, 0.7594, 0.9643, 0.3819, 0.4520, 0.6310, 0.6241, 0.6840, 0.1978],
        [0.3641, 0.8806, 0.6636, 0.7896, 0.4980, 0.4865, 0.6539, 0.4849, 0.6162, 0.0276]], grad_fn=<ClampBackward1>)


These are our l2 outputs:
tensor([[ 0.1788,  0.3784,  0.1961,  0.0859,  0.2355,  0.1763,  0.2745,  0.2243,  0.3367,  0.2174],
        [ 0.3220,  0.4116,  0.2579, -0.0955,  0.2618,  0.2804,  0.4132,  0.4018,  0.5158,  0.3829],
        [ 0.4339,  0.6410,  0.3890,  0.0903,  0.4595,  0.3141,  0.4068,  0.4590,  0.6507,  0.3687],
        [ 0.3566,  0.5638,  0.3251,  0.0046,  0.3596,  0.3282,  0.4666,  0.3938,  0.6289,  0.4177],
        [ 0.2960,  0.4863,  0.3065, -0.0497,  0.1770,  0.3261,  0.4764,  0.4248,  0.3754,  0.3667],
        [ 0.3670,  0.4384,  0.2241,  0.0423,  0.3642,  0.2121,  0.3428,  0.5588,  0.5101,  0.2422],
        [ 0.2763,  0.3741,  0.1834,  0.0262,  0.2649,  0.2036,  0.3613,  0.4662,  0.4422,  0.2451],
        ...,
        [ 0.3498,  0.4886,  0.2095,  0.0291,  0.3451,  0.2555,  0.4172,  0.3680,  0.6598,  0.4025],
        [ 0.3693,  0.4834,  0.2048,  0.0112,  0.3305,  0.2773,  0.4418,  0.3806,  0.6904,  0.4129],
        [ 0.5209,  0.6908,  0.2949,  0.0601,  0.3974,  0.3924,  0.4738,  0.5311,  0.7260,  0.3913],
        [ 0.3983,  0.4815,  0.3269, -0.0498,  0.3313,  0.3176,  0.5540,  0.4679,  0.6727,  0.4875],
        [ 0.2548,  0.4201,  0.1639,  0.0116,  0.2426,  0.2281,  0.4484,  0.4445,  0.4960,  0.3393],
        [ 0.4534,  0.5900,  0.2193, -0.0076,  0.3166,  0.3599,  0.4884,  0.4361,  0.7055,  0.4458],
        [ 0.3559,  0.5106,  0.2420,  0.0457,  0.3540,  0.2792,  0.4954,  0.4229,  0.6938,  0.4276]], grad_fn=<AddBackward0>)


These are our r2 outputs:
tensor([[0.1788, 0.3784, 0.1961, 0.0859, 0.2355, 0.1763, 0.2745, 0.2243, 0.3367, 0.2174],
        [0.3220, 0.4116, 0.2579, 0.0000, 0.2618, 0.2804, 0.4132, 0.4018, 0.5158, 0.3829],
        [0.4339, 0.6410, 0.3890, 0.0903, 0.4595, 0.3141, 0.4068, 0.4590, 0.6507, 0.3687],
        [0.3566, 0.5638, 0.3251, 0.0046, 0.3596, 0.3282, 0.4666, 0.3938, 0.6289, 0.4177],
        [0.2960, 0.4863, 0.3065, 0.0000, 0.1770, 0.3261, 0.4764, 0.4248, 0.3754, 0.3667],
        [0.3670, 0.4384, 0.2241, 0.0423, 0.3642, 0.2121, 0.3428, 0.5588, 0.5101, 0.2422],
        [0.2763, 0.3741, 0.1834, 0.0262, 0.2649, 0.2036, 0.3613, 0.4662, 0.4422, 0.2451],
        ...,
        [0.3498, 0.4886, 0.2095, 0.0291, 0.3451, 0.2555, 0.4172, 0.3680, 0.6598, 0.4025],
        [0.3693, 0.4834, 0.2048, 0.0112, 0.3305, 0.2773, 0.4418, 0.3806, 0.6904, 0.4129],
        [0.5209, 0.6908, 0.2949, 0.0601, 0.3974, 0.3924, 0.4738, 0.5311, 0.7260, 0.3913],
        [0.3983, 0.4815, 0.3269, 0.0000, 0.3313, 0.3176, 0.5540, 0.4679, 0.6727, 0.4875],
        [0.2548, 0.4201, 0.1639, 0.0116, 0.2426, 0.2281, 0.4484, 0.4445, 0.4960, 0.3393],
        [0.4534, 0.5900, 0.2193, 0.0000, 0.3166, 0.3599, 0.4884, 0.4361, 0.7055, 0.4458],
        [0.3559, 0.5106, 0.2420, 0.0457, 0.3540, 0.2792, 0.4954, 0.4229, 0.6938, 0.4276]], grad_fn=<ClampBackward1>)


These are our l3 outputs:
tensor([[1.7382, 1.0127],
        [2.3116, 1.5027],
        [3.1483, 1.8254],
        [2.8914, 1.7114],
        [2.1701, 1.6438],
        [2.2242, 1.4845],
        [1.9086, 1.3069],
        ...,
        [2.7114, 1.5372],
        [2.7652, 1.5826],
        [3.3515, 2.0682],
        [2.8961, 1.8331],
        [2.1227, 1.4342],
        [3.0565, 1.8808],
        [2.8763, 1.6804]], grad_fn=<AddBackward0>)


These are our softmax outputs:
tensor([[0.6738, 0.3262],
        [0.6919, 0.3081],
        [0.7897, 0.2103],
        [0.7649, 0.2351],
        [0.6286, 0.3714],
        [0.6769, 0.3231],
        [0.6460, 0.3540],
        ...,
        [0.7639, 0.2361],
        [0.7654, 0.2346],
        [0.7830, 0.2170],
        [0.7433, 0.2567],
        [0.6656, 0.3344],
        [0.7642, 0.2358],
        [0.7678, 0.2322]], grad_fn=<DivBackward0>)
y_preds = softmax.output

#hide
wandbs
(tensor([[    -0.0002,      0.2331,      0.0117,      0.2600,      0.2774,      0.1307,      0.1950,      0.2057,      0.1399,     -0.0986],
         [     0.1946,     -0.0915,      0.0643,      0.0087,      0.2372,      0.2070,      0.2365,      0.2561,      0.2278,     -0.0841],
         [     0.0705,     -0.1123,     -0.0432,      0.2374,      0.1913,      0.1934,      0.0814,     -0.0839,     -0.1167,      0.2246],
         [     0.0705,      0.1724,      0.1248,      0.0990,      0.1931,     -0.0708,     -0.0022,      0.2181,      0.0299,      0.1734],
         [     0.1153,     -0.1066,     -0.0893,      0.1900,      0.2777,      0.1359,      0.1473,      0.1143,      0.2712,     -0.0556],
         [     0.1509,      0.1151,      0.0681,     -0.0028,      0.2353,      0.1948,      0.0870,      0.0006,     -0.0163,      0.0389],
         [    -0.0553,      0.2512,      0.0427,      0.2267,     -0.0093,      0.1136,     -0.1013,      0.1188,     -0.0013,      0.1935],
         [     0.1943,      0.2582,      0.1152,      0.2282,      0.1577,      0.0751,      0.0589,      0.0211,     -0.0506,      0.1875],
         [     0.0339,      0.1409,      0.0418,      0.1518,      0.0863,      0.0752,     -0.0287,      0.0064,     -0.0014,      0.0103],
         [    -0.1056,      0.2283,      0.1243,      0.1812,      0.1591,      0.1607,      0.1488,     -0.0419,     -0.0072,     -0.0633],
         [    -0.1127,     -0.0293,      0.1272,      0.0205,      0.0115,      0.2260,     -0.0251,      0.2198,     -0.0274,      0.0513],
         [     0.0951,      0.0732,      0.2631,      0.1341,     -0.0821,      0.0232,      0.1464,      0.0473,      0.1390,     -0.0458],
         [     0.1315,      0.0439,      0.1412,     -0.0664,      0.0940,      0.1332,     -0.0286,      0.0506,      0.0248,      0.0755],
         [     0.2530,      0.2420,      0.2765,      0.2048,     -0.0882,     -0.0035,      0.1577,      0.0605,      0.0726,      0.2733],
         [     0.2079,      0.0583,      0.0246,      0.1822,     -0.0031,      0.1719,     -0.0781,      0.2127,      0.2108,      0.0760],
         [    -0.0597,     -0.0759,      0.1393,      0.2165,      0.2028,      0.0627,     -0.0170,     -0.0651,      0.0579,      0.0773],
         [     0.1473,     -0.0495,      0.2317,      0.0157,     -0.1144,     -0.0419,      0.1788,      0.2388,      0.1428,      0.0085],
         [     0.0061,      0.2621,      0.2392,      0.0263,     -0.1017,     -0.0018,      0.1258,      0.1266,      0.1345,     -0.0650]],
        requires_grad=True),
 tensor([[-0.0621, -0.0691,  0.2171, -0.0623,  0.1145,  0.0525,  0.2408,  0.2170,  0.1916,  0.0629],
         [-0.0341, -0.0009, -0.0881,  0.1173,  0.0511, -0.0289,  0.1675,  0.2559,  0.0796, -0.0451],
         [ 0.1161,  0.0154, -0.0154, -0.0897,  0.1087, -0.0746, -0.0648, -0.0229,  0.1991,  0.2154],
         [ 0.0914,  0.0735, -0.0821, -0.0773,  0.0863,  0.0284, -0.0972,  0.1577,  0.1829, -0.0549],
         [ 0.0412,  0.1230,  0.2663,  0.0677,  0.2552, -0.0593,  0.0626,  0.1118, -0.0184,  0.1215],
         [-0.0723,  0.1951,  0.0814,  0.1907, -0.0819,  0.0893,  0.2131, -0.0612,  0.0357,  0.0940],
         [ 0.2410,  0.0710, -0.0211,  0.1043,  0.0940,  0.0236, -0.1108, -0.0611,  0.2035, -0.0306],
         [ 0.1431,  0.2022,  0.1448, -0.1078, -0.0829,  0.1917,  0.2769,  0.1859,  0.0531,  0.1962],
         [ 0.0767,  0.2175,  0.1521, -0.0540,  0.0504,  0.2353,  0.2760, -0.1082,  0.1134,  0.2455],
         [ 0.2477,  0.2034,  0.1353,  0.0091, -0.0367,  0.2536,  0.0904,  0.1440, -0.1072, -0.0786]], requires_grad=True),
 tensor([[ 0.6042,  0.9942],
         [ 1.1100,  0.6981],
         [ 0.2269, -0.0373],
         [ 0.8578,  0.3621],
         [ 1.1303,  0.1530],
         [ 1.2218,  0.6881],
         [ 0.4251,  0.7249],
         [-0.3872,  0.6244],
         [ 1.3207, -0.3660],
         [ 0.7619,  0.8244]], requires_grad=True),
 tensor([[ 0.0031,  0.0179, -0.0161, -0.0428, -0.0240, -0.0101, -0.0082,  0.0291,  0.0329, -0.0085]], requires_grad=True),
 tensor([[-0.0072,  0.0279, -0.0105, -0.0237, -0.0117,  0.0383,  0.0107, -0.0042,  0.0435,  0.0144]], requires_grad=True),
 tensor([[-0.0298, -0.0055]], requires_grad=True))

Defining our loss function:

Negative log loss is going to be our loss function. We use our predictions from softmax to calculate the loss. I won’t bore youwith all the explanations. That is done in the resources I mentioned above, in a better way than I ever can.

class negative_log_loss:
    def calculate(self, y_preds, y_true):
        samples = len(y_preds)
        y_pred_clipped = torch.clip(y_preds, 1e-7, 1-1e-7)
        
        if len(y_true.shape) == 1:
            correct_confidences = y_pred_clipped[torch.tensor(range(samples)), y_true]
           
        elif len(y_true.shape) == 2:
            correct_confidences = torch.sum(y_pred_clipped*y_true, axis=1)
        negative_log_likelihoods = -torch.log(correct_confidences)
        return torch.mean(negative_log_likelihoods)

‘Training’ our neural network

First we are gonna write a function to update our weights and biases according to our loss(i.e. by reducing the product of the gradient and learning rate by our w&bs.)

def update_wandbs(wandbs, lr):
    for layer in wandbs:
        layer.sub_(layer.grad * lr)
        #print(layer.grad)
        layer.grad.zero_()

Then we write a training loop for our network, to train for one ‘epoch’.

def one_epoch(wandbs, lr, inputs):
    layer1.weights, layer2.weights, layer3.weights, layer1.biases, layer2.biases, layer3.biases = wandbs
    #print('These are our weights and biases:')
    #print(wandbs)
    layer1.forward(inputs)
    #print("\n")
    #print('These are our l1 outputs:')
    #print(layer1.output)
    relu1.forward(layer1.output)
    #print("\n")
    #print('These are our r1 outputs:')
    #print(relu1.output)
    layer2.forward(relu1.output)
    #print("\n")
    #print('These are our l2 outputs:')
    #print(layer2.output)
    relu2.forward(layer2.output)
    #print("\n")
    #print('These are our r2 outputs:')
    #print(relu2.output)
    layer3.forward(relu2.output)
    #print("\n")
    #print('These are our l3 outputs:')
    #print(layer3.output)
    softmax = Softmax_act()
    softmax.forward(layer3.output)
    y_preds = softmax.output
    #print("\n")
    #print('These are our softmax outputs:')
    #print(softmax.output)
    nll = negative_log_loss()
    loss = nll.calculate(y_preds, y_true)
    loss.backward()
    with torch.no_grad(): update_wandbs(wandbs, 2)
    print(f"{loss:.3f}", end="; ")

Then, we define another function so that we can train the model easily for multiple epochs. Learning rate is an important hyperparameter here. Refer to Jeremy’s notebooks/videos if you don’t already know about it.

def train_model(wandbs, lr, inputs, epochs=30):
    #torch.manual_seed(442)
    for i in range(epochs): one_epoch(wandbs, lr, inputs)
    return y_preds
y_preds = train_model(wandbs, 0.5, inputs, epochs=40)
0.801; 1.816; 0.827; 0.694; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 0.693; 
y_preds
tensor([[0.6738, 0.3262],
        [0.6919, 0.3081],
        [0.7897, 0.2103],
        [0.7649, 0.2351],
        [0.6286, 0.3714],
        [0.6769, 0.3231],
        [0.6460, 0.3540],
        ...,
        [0.7639, 0.2361],
        [0.7654, 0.2346],
        [0.7830, 0.2170],
        [0.7433, 0.2567],
        [0.6656, 0.3344],
        [0.7642, 0.2358],
        [0.7678, 0.2322]], grad_fn=<DivBackward0>)
def accuracy(y_preds, y_true):
    samples = len(y_preds)
    return print(f'Accuracy is {(y_true.bool()==(y_preds[torch.tensor(range(samples)), y_true]>0.5)).float().mean()*100 :.3f} percent')
accuracy(y_preds, y_true)
Accuracy is 0.000 percent

Well, My network is still pretty crappy. I tried a lot, but couldn’t get it to train yet. I’m gonna keep trying. But for now, I’m going to use a framework to make my life easier. Afterall, purpose of this whole exercise was not to get an accurate model, but to understand the nuts and bolts of a neural network!