First, we need synthetic data to work with. The data should exhibit some nonlinear dependence. Let's define it like this:
In Python it will have the following form:
np.random.seed(42)
x = np.random.normal(1, 4.5, 10000)
y = np.piecewise(x, (x < -2,(x >= -2) & (x < 2), x >= 2), (lambda x: 2*x + 5, lambda x: 7.3*np.sin(x), lambda x: -0.03*x**3 + 2)) + np.random.normal(0, 1, x.shape)
After viewing:
Since we are visualizing a 3D space, our neural network will only have 2 weights. This means that the ANN will be made up of a single hidden neuron. Implementing this in PyTorch is quite intuitive:
class ANN(nn.Module):
def __init__(self, input_size, N, output_size):
super().__init__()
self.net = nn.Sequential()
self.net.add_module(name='Layer_1', module=nn.Linear(input_size, N, bias=False))
self.net.add_module(name='Tanh',module=nn.Tanh())
self.net.add_module(name='Layer_2',module=nn.Linear(N, output_size, bias=False))
Important! Don't forget to turn off biases on your layers, otherwise you will end up having x2 more parameters.
To construct the error surface, we first need to create a grid of possible values for W1 and W2. Then, for each combination of weights, we will update the network parameters and calculate the error:
W1, W2 = np.arange(-2, 2, 0.05), np.arange(-2, 2, 0.05)
LOSS = np.zeros((len(W1), len(W2)))
for i, w1 in enumerate(W1):
model.net._modules('Layer_1').weight.data = torch.tensor(((w1)), dtype=torch.float32)for j, w2 in enumerate(W2):
model.net._modules('Layer_2').weight.data = torch.tensor(((w2)), dtype=torch.float32)
model.eval()
total_loss = 0
with torch.no_grad():
for x, y in test_loader:
preds = model(x.reshape(-1, 1))
total_loss += loss(preds, y).item()
LOSS(i, j) = total_loss / len(test_loader)
It may take some time. If you make the resolution of this grid too coarse (that is, the size of the step between possible weight values), local minima and maxima may be lost. Remember how the learning rate is often predicted to decrease over time? When we do this, the absolute change in weight values can be as small as 1e-3 or less. A grid with a pitch of 0.5 simply will not capture these fine error surface details!
At this point, we don't care at all about the quality of the trained model. However, we want to pay attention to the learning rate, so let's keep it between 1e-1 and 1e-2. We will simply collect the weight values and errors during the training process and store them in separate lists:
model = ANN(1,1,1)
epochs = 25
lr = 1e-2optimizer = optim.SGD(model.parameters(),lr =lr)
model.net._modules('Layer_1').weight.data = torch.tensor(((-1)), dtype=torch.float32)
model.net._modules('Layer_2').weight.data = torch.tensor(((-1)), dtype=torch.float32)
errors, weights_1, weights_2 = (), (), ()
model.eval()
with torch.no_grad():
total_loss = 0
for x, y in test_loader:
preds = model(x.reshape(-1,1))
error = loss(preds, y)
total_loss += error.item()
weights_1.append(model.net._modules('Layer_1').weight.data.item())
weights_2.append(model.net._modules('Layer_2').weight.data.item())
errors.append(total_loss / len(test_loader))
for epoch in tqdm(range(epochs)):
model.train()
for x, y in train_loader:
pred = model(x.reshape(-1,1))
error = loss(pred, y)
optimizer.zero_grad()
error.backward()
optimizer.step()
model.eval()
test_preds, true = (), ()
with torch.no_grad():
total_loss = 0
for x, y in test_loader:
preds = model(x.reshape(-1,1))
error = loss(preds, y)
test_preds.append(preds)
true.append(y)
total_loss += error.item()
weights_1.append(model.net._modules('Layer_1').weight.data.item())
weights_2.append(model.net._modules('Layer_2').weight.data.item())
errors.append(total_loss / len(test_loader))
Finally, we can visualize the data we have collected using plotly. The plot will have two scenes: surface and SGD trajectory. One of the ways to do the first part is to create a figure with an argument. surface. After that, we'll give it some style by updating a design.
The second part is as simple as it is: just use scatter3d function and specify the three axes.
import plotly.graph_objects as go
import plotly.io as pioplotly_template = pio.templates("plotly_dark")
fig = go.Figure(data=(go.Surface(z=LOSS, x=W1, y=W2)))
fig.update_layout(
title='Loss Surface',
scene=dict(
xaxis_title='w1',
yaxis_title='w2',
zaxis_title='Loss',
aspectmode='manual',
aspectratio=dict(x=1, y=1, z=0.5),
xaxis=dict(showgrid=False),
yaxis=dict(showgrid=False),
zaxis=dict(showgrid=False),
),
width=800,
height=800
)
fig.add_trace(go.Scatter3d(x=weights_2, y=weights_1, z=errors,
mode='lines+markers',
line=dict(color='red', width=2),
marker=dict(size=4, color='yellow') ))
fig.show()
Running it in Google Colab or locally in Jupyter Notebook will allow you to investigate the surface of the bug more closely. Honestly, I spent a lot of time looking at this figure 🙂
I'd love to see you on the surface, so feel free to share in the comments. I firmly believe that the more imperfect the surface, the more interesting it is to investigate!
===========================================================
All of my posts on Medium are free and open access, so I would really appreciate it if you would follow me here!
PS: I am passionate about (geo)data science, machine learning/artificial intelligence and climate change. So if you want to work together on any projects, please contact me at LinkedIn and take a look my website!
Follow for more