Neural Network Architecture for Detecting Traffic Signs

Neural Network Architecture for Detecting Traffic Signs

The goal of this project is to design a convolutional neural network to classify traffic signs. Other architectures are explored such as GoogLeNet. I started with the LeNet architecture and then added inception modules and dropout.

The goals/steps of this project are the following:

  • Load the data set (Download Here)
  • Explore, summarize and visualize the data samples
  • Design, train and test the model architecture
  • Use the model to make predictions on new images
  • Analyze the softmax probabilities of the new images

Training, Validation and Testing Data Size

I used numpy to calculate summary statistics of the traffic signs data set:

  • Size of training set: 34799
  • Size of the validation set: 4410
  • Size of test set: 12630
  • Shape of a traffic sign image: 32x32 (RGB)
  • The number of unique classes/labels: 43

Distribution of Training Data

Here is an exploratory visualization of the data set. The bar chart shows the distribution of training samples by class label.

Convolutional Neural Network (CNN) Model Architecture

Preprocessing Data Sets

The first step is grayscaling the images to increase performance during training. However, there may be some cases where the color can be significant, although I haven’t explored this.

The image is also normalized to improve training performance, making the grayscale pixel numerical values between 0.1-0.9 instead of the range from 0-255.

Here is an example of a traffic sign image before and after grayscaling and normalization.

Model Architecture

My final model consisted of the following layers. I started with the LeNet architecture and then followed the paper on GoogLeNet by adding inception modules, average pooling followed by 2 fully connected layers with dropout in between both.

LayerDescription
Input32x32x3 RGB image
Preprocessing32x32x1 Grayscale normalized image
Convolution 5x51x1 stride, valid padding, outputs 28x28x6
RELU 1Activation
Convolution 5x51x1 stride, valid padding, outputs 24x24x16
RELU 2Activation
Average pooling 12x2 stride, outputs 12x12x16
Inception Module 1outputs 12x12x256
Average pooling 22x2 stride, outputs 6x6x256
Inception Module 2outputs 6x6x512
Average pooling 32x2 stride, same padding, outputs 6x6x512
Dropout 1Flatten the output and dropout 50% layer
Fully connected 1Outputs 9216x512
Dropout 2Dropout 50% layer
Fully connected 2Outputs 512x43
SoftmaxSoftmax probabilities

Training the Model

  • Preprocessed the images
  • One hot encoded the labels
  • Trained using Adam optimizer
  • Batch size: 128 images
  • Learning rate: 0.001
  • Variables initialize with truncated normal distributions
  • Hyperparameters, mu = 0, sigma = 0.1, dropout = 0.5
  • Trained for 20 epochs

Training was carried out on an AWS g2.2xlarge GPU instance. Each epoch was completed in around 24s, total training time of about 6-8 minutes.

Designing the Model

Final model results:

  • Validation set accuracy: 0.97
  • Test set accuracy: 0.95

The approach started with a review of LeNet and GoogLeNet architectures and those were used as a starting point to decide how to order and how many inception modules, pooling convolution and dropout layers to use.

  • First started with LeNet and with minor tweaks, achieved an accuracy of about 0.89
  • Reduced and switched max pooling to average pooling to get to 0.90
  • Added an inception module after the first two convolutions to get to 0.90-0.93
  • At this point, there were very deep inception modules (referencing GoogLeNet) and 4 fully connected layers and the model appeared to be overfitting.
  • Reducing the number of fully connected layers to 2 and adding dropout between them made the largest impact and achieved around 0.97 accuracy.

Once I noticed the accuracy starting high and surpassing the minimum 0.93, I tweaked the layers by trial and error but my approach was to try and keep as much information flowing through the model while balancing the dimensional reduction. This led me to have 2 convolution layers with small patch sizes, average pooling with small patch size and do more severe reductions with the inception modules.

Test a Model on New Images

Here are some other German traffic signs that I found on the web. The signs are generally easy to identify, I chose a few that are at angles and one that had a “1” spray painted onto a 30km/h sign.

New Image Test Results

Here are the results of the predictions on the new traffic signs:

ImagePrediction
YieldYield
Traffic SignalsTraffic Signals
General CautionGeneral Caution
Speed Limit 30Speed Limit 30
Right of WayRight of Way
Road WorkRoad Work
Bumpy RoadBicycles Crossing
No EntryNo Entry
Speed Limit 100Roundabout Mandatory
Speed Limit 30 (spray painted)Speed Limit 30

The model was able to correctly guess 8 of the 10 traffic signs, which gives an accuracy of 80%. I was impressed that the spray-painted sign still classified correctly even though the clear 100km/h sign was wrong.

The bicycles crossing and bumpy road signs do look very similar at low resolution, so I can understand that wrong prediction.

Model Certainty

The following shows the top 5 softmax probabilities for each sign.

Visualizing the Model Layers

It’s interesting to see the layers of the network as it progresses. The convolution layers look the most interesting, while the average pooling layers start to look overly simplified. However, the average pooling images only show one layer in a very deep set.

Conclusions and Discussion

Having LeNet and GoogLeNet as starting points was very helpful. There are still many areas for improvement by tweaking variables, moving/adding/removing layers and augmenting the data set. I noticed that when applying the GoogLeNet architecture directly to this data set, the training time was very high and accuracy plateaued quickly which could indicate over/underfitting. But in this case, I believe there were too many parameters, I found better performance by reducing the depth of the inception layers and adding dropout.

Check out the GitHub repo for the code and Jupyter Notebook.