Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Artificial Intelligence and the Fine-Tuning of Convolutional Neural Networks

DZone's Guide to

Artificial Intelligence and the Fine-Tuning of Convolutional Neural Networks

Fine-tuning convolutional neural networks comes with a range of peculiarities. In this article, analyze correlation between CNN fine-tuning accuracy and dataset size.

· AI Zone ·
Free Resource

Start coding something amazing with the IBM library of open source AI code patterns.  Content provided by IBM.

The fascination with artificial neural networks (ANN) continues to grow in popularity in the technology sector. ANN can perform a wide variety of tasks including data classification and regression. Its differentiator is the ability to learn, which allows ANN to generalize input data, analyze new sets, and make decisions based on previous experience. But its use is still plagued by recurring challenges to network topology selection, network specifications and their correlation, learning result testing, and so on.

A separate issue is training dataset formation, which has to contain thousands of items because its volume and quality can affect the success of ANN learning and thus the quality of any ANN endeavor. Another problem is the act of dataset quality assessment itself because a person and a computer perceive images in different ways. There are pre-existing datasets but they cannot cover all class features. ANN learning from scratch is a time-consuming and resource-intensive proposition, so the possibility of its optimization is still being widely researched. As a result, the fine-tuning concept for ANN appeared, which allows programmers to retrain neural networks to process actual data features for certain jobs using fewer resources.

The aim of research conducted for this piece was to assess if there exists a correlation between NN fine-tuning accuracy and dataset size based on a pre-trained neural network model.

Convolutional neural networks (CNN) are mostly used for image classification. If one attempted to use multilayer perceptrons for an image with a 100×100 pixel resolution having three channels (RGB), then each neuron will add 30,000 parameters to model on the first hidden layer of output. That leads to difficulties in calculation, but CNNs are capable of resolving this issue. They decrease broad parameters by using a small matrix of weights. This concept is based on the idea that nearby pixels are more related in feature forming than those further. If you use a core of 3×3 pixels containing eight channels, you’ll receive 3x3x8 or 216 parameters, which is significantly lower than 30,000, thereby making calculations far faster and easier.

There are hosts of CNN models; two of the best known of which are AlexNet and GoogleNet, both of which won the lauded ImageNet Competition for 2012 and 2014, respectively. AlexNet is an NN that trained for a meager two weeks using 2 GPUs. GoogleNet is a far deeper CNN and took 21 days using 1 GPU or 23.4 hours using 32 GPU to train it.

Models and Datasets

Model accuracy is presented in the table below, with the top-tier matching rate of the desired class and the class having the maximum probability, and the top five occurrence rates of the desired class with a list of five classes having the highest probability. These models were trained on the ImageNet dataset.

Testing results, %:

Model

Top 1%

Top 5%

AlexNet

57.0

80.3

GoogleNet

68.7

88.9

Both models exhibited relatively strong accuracy percentages, but they could potentially have been better after fine-tuning. Predictably, the quality of NN work mostly depends on the quality of the dataset. ImageNet datasets of specific objects, locations, or people often contain thousands of images, but it’s often not enough to produce sufficiently accurate results. The most agile ImageNet dataset contains a staggering 14 million images. For example, the AlexNet model, the algorithm developed under ImageNet auspices and trained with it, attempted car recognition with a mere 1000 images. Neural networks could not classify half of these images correctly.

For instance, neural networks hilariously recognized the image of a car as:

  • 10.88%: Barbershop
  • 8.84%: Vending machine
  • 6.90%: Soda bottle
  • 5.99%: Chain-link fence
  • 3.41%: Mosquito net

The car category was not even present in the one-hundred highest probabilities for identification.

Also, glares on car body affect the quality of recognition.

Image title

Radiator grill, Top-1 = 31.68%

Image title

Sport car, Top-1 = 12.53%

It seems that one aspect of confusion for the algorithm was the presence of glares on the car and its appearance in various pictures for the quality of recognition. That raises a question regarding different methods for image pre-processing. Without going into detail, three major methods for pre-processing examined here are:

  1. Histogram equalization
  2. Gamma-correction
  3. Retinex (MSRCR)

Peculiarities of Fine-Tuning

Fine-tuning can be divided into four scenarios depending on dataset size and content compatibility:

  1. The initial dataset is of smaller size than the new one and the content is similar. The likely possibility is that NN overtraining appears. Moreover, distinctive features will be similar to the initial sets. So, retraining of linear classifiers will be enough.
  2. The initial dataset is of smaller size than the new one but the content is different. Only a linear classifier should be retrained, but retraining should start with an earlier layer.
  3. The dataset sizes are comparable and the content is similar. Overtraining should not take place, and there are no reasons to avoid the use of fine-tuning.
  4. The dataset sizes are comparable but the content is different. Training can be started from scratch, but convolutional nets’ early layers are responsible for general features; therefore, in practice, it is more convenient to use a pre-trained model and fine-tune it.

One more important peculiarity is pertaining to the reuse of model weights.

There are three modes:

  • Freeze: The weight is invariable.
  • Transposing: Weight is modifiable, but start value is from the pre-trained model.
  • Randomizing: Random, close to zero value overlaps pre-trained one.

Based on the conclusions above, further tests were performed. First, an advanced application for image gathering was developed, and incorrect ones were removed manually.

For testing, the Caffe framework was used after its comparison with TensorFlow, Torch, and Theano. The last layer was changed for both AlexNet and GoogleNet, and the others were transposed.

New last layer for AlexNet:

layer {
  name: "fc8_car"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8_car"
  param {
lr_mult: 1
decay_mult: 1
  }
  param {
lr_mult: 2
decay_mult: 0
  }
  inner_product_param {
num_output: 1
weight_filler {
  type: "gaussian"
  std: 0.01
}
bias_filler {
  type: "constant"
  value: 0
}
  }
}

New last layer for GoogleNet:

layer {
  name: "loss3/car_classifier"
  type: "InnerProduct"
  bottom: "pool5/7x7_s1"
  top: "loss3/car_classifier "
  param {
lr_mult: 1
decay_mult: 1
  }
  param {
lr_mult: 2
decay_mult: 0
  }
  inner_product_param {
num_output: 1
weight_filler {
  type: "xavier"
}
bias_filler {
  type: "constant"
  value: 0
}
  }
}

In all, the following results were received:


Dataset size

Image filter

Top 1%

Top 5%

AlexNet 10 000 Histogram equalization 75.6 83.1
Gamma correction 88.6 92.4
MSRCR 73.5 80.7
20 000 Histogram equalization 79.7 87.8
Gamma correction 92.3 95.3
MSRCR 76.9 85.9
GoogleNet 10 000 Histogram equalization 82.5 91.7
Gamma correction 89.1 93.8
MSRCR 77.4 90.5
20 000 Histogram equalization 89.7 95.2
Gamma correction 94.4 97.9
MSRCR 87.6 91.3

At the end of the study, the strongest results appeared after a fine-tuning of GoogleNet using a dataset of 20,000 images, which contained standard orientation, rotated images, and those filtered with gamma correction. The accuracy of 94.4% of the images in the Top 1 position and 97.9% in the top Top 5 position were achieved. On the basis of these results, we can infer that the filtering and expanding of datasets allows for a dramatic increase in the quality of classification.

As the recent mania over artificial intelligence continues, the origins of its development and the ongoing fine-tuning of its algorithms and datasets will open new possibilities for technology and quality of life that were previously unimaginable. Scientists and mathematicians will remain the sources of genius and experimentation that will quietly fuel another technological revolution taken to the mass market in the twenty-first century.

Start coding something amazing with the IBM library of open source AI code patterns.  Content provided by IBM.

Topics:
neural networks ,ai ,tutorial ,image classification ,alexnet ,googlenet

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}