3.10. Concise Implementation of Multilayer Perceptron¶
Now that we learned how multilayer perceptrons (MLPs) work in theory, let’s implement them. We begin, as always, by importing modules.
In [1]:
import sys
sys.path.insert(0, '..')
import d2l
from mxnet import gluon, init
from mxnet.gluon import loss as gloss, nn
3.10.1. The Model¶
The only difference from the softmax regression is the addition of a fully connected layer as a hidden layer. It has 256 hidden units and uses ReLU as the activation function.
In [2]:
net = nn.Sequential()
net.add(nn.Dense(256, activation='relu'))
net.add(nn.Dense(10))
net.initialize(init.Normal(sigma=0.01))
One minor detail is of note when invoking net.add()
. It adds one or
more layers to the network. That is, an equivalent to the above lines
would be net.add(nn.Dense(256, activation='relu'), nn.Dense(10))
.
Also note that Gluon automagically infers the missing parameteters, such
as the fact that the second layer needs a matrix of size
\(256 \times 10\). This happens the first time the network is
invoked.
We use almost the same steps for softmax regression training as we do for reading and training the model.
In [3]:
batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
loss = gloss.SoftmaxCrossEntropyLoss()
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.5})
num_epochs = 10
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, None,
None, trainer)
epoch 1, loss 0.8046, train acc 0.699, test acc 0.825
epoch 2, loss 0.4917, train acc 0.818, test acc 0.847
epoch 3, loss 0.4262, train acc 0.842, test acc 0.853
epoch 4, loss 0.3962, train acc 0.854, test acc 0.869
epoch 5, loss 0.3762, train acc 0.862, test acc 0.875
epoch 6, loss 0.3580, train acc 0.867, test acc 0.877
epoch 7, loss 0.3409, train acc 0.875, test acc 0.875
epoch 8, loss 0.3293, train acc 0.878, test acc 0.878
epoch 9, loss 0.3165, train acc 0.884, test acc 0.882
epoch 10, loss 0.3115, train acc 0.885, test acc 0.878
3.10.2. Problems¶
- Try adding a few more hidden layers to see how the result changes.
- Try out different activation functions. Which ones work best?
- Try out different initializations of the weights.