Five very powerful CNN architectures
This article is a technical blog compiled by AI Research and originally titled. Five Powerful CNN Architectures Author Faisal Shahbaz Translation | Little Brother, Jaruce, zackary, Disillusion Proofreading | By Soyabanashi pineapple girl Link to original article. https://medium.com/@faisalshahbaz/five-powerful-cnn-architectures-b939c9ddd57b note:For links to this article, please click on the end of the article【 Read the original article】 Conducting visits
Let's take a look at some powerful convolutional neural networks that implement the deep learning that laid the foundation for today's achievements in computer vision.
LeNet-5, a 7-layer convolutional neural network, is used by many banks to recognize handwritten digits on checks.
Gradient-based learning applied to document recognition
The handwritten numbers are digitized into a picture with a size of 32*32. In this case, this technique cannot be applied to large scale images due to the limitation of computing power.
Let's understand the structure of this model. In addition to the input layer, this model has seven layers. Since the structure is very miniature, we examine the model layer by layer.
I recommend that the cross-entropy loss function and the softmax activation function be used in the final layer, and I will not go into the details of the loss function and the reasons for using it here. Please use different training plans and learning rates for your training.
from keras import layers from keras.models import Model def lenet_5(in_shape=(32,32,1), n_classes=10, opt='sgd'): in_layer = layers.Input(in_shape) conv1 = layers.Conv2D(filters=20, kernel_size=5, padding='same', activation='relu')(in_layer) pool1 = layers.MaxPool2D()(conv1) conv2 = layers.Conv2D(filters=50, kernel_size=5, padding='same', activation='relu')(pool1) pool2 = layers.MaxPool2D()(conv2) flatten = layers.Flatten()(pool2) dense1 = layers.Dense(500, activation='relu')(flatten) preds = layers.Dense(n_classes, activation='softmax')(dense1) model = Model(in_layer, preds) model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"]) return model if __name__ == '__main__': model = lenet_5() print(model.summary())
In 2012, Hinton's deep neural network competed in the imagenet, the world's most important computer vision challenge, and reduced the top-5 loss from 26% to 15.3%, a result that wowed the world.
This neural network is much like LeNetg, but deeper than it, with about sixty million parameters.
Using deep convolutional neural networks to participate in ImageNet
This calculation process does look a bit scary. This is because the network consists of two halves, each of which is trained on two different GPUs. Let's make the process easier by illustrating this with a condensed version of the diagram.
This structure consists of 5 convolutional layers and 3 fully connected layers. All eight layers also used two new concepts of the time - maximum pooling and Relu activation - to provide an advantage to the model.
You can find the different layers and their corresponding configurations in the diagram above. Each layer is described in the following table.
note : The Relu activation function is used in all convolutional layers except the final softmax layer and the output part of the fully connected layer.
The authors also use many other techniques (not all of which will be discussed in this post) - such as dropout, augmentatio, and momentum stochastic gradient descent.
from keras import layers from keras.models import Model def alexnet(in_shape=(227,227,3), n_classes=1000, opt='sgd'): in_layer = layers.Input(in_shape) conv1 = layers.Conv2D(96, 11, strides=4, activation='relu')(in_layer) pool1 = layers.MaxPool2D(3, 2)(conv1) conv2 = layers.Conv2D(256, 5, strides=1, padding='same', activation='relu')(pool1) pool2 = layers.MaxPool2D(3, 2)(conv2) conv3 = layers.Conv2D(384, 3, strides=1, padding='same', activation='relu')(pool2) conv4 = layers.Conv2D(256, 3, strides=1, padding='same', activation='relu')(conv3) pool3 = layers.MaxPool2D(3, 2)(conv4) flattened = layers.Flatten()(pool3) dense1 = layers.Dense(4096, activation='relu')(flattened) drop1 = layers.Dropout(0.5)(dense1) dense2 = layers.Dense(4096, activation='relu')(drop1) drop2 = layers.Dropout(0.5)(dense2) preds = layers.Dense(n_classes, activation='softmax')(drop2) model = Model(in_layer, preds) model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"]) return model if __name__ == '__main__': model = alexnet() print(model.summary())
Runner-up in the 2014 IMAGENET Challenge. Because this unified architecture is so lightweight, many newcomers use it as a simple form of deep convolutional neural network.
In the following article, we will learn how one of the most commonly used network architectures extracts features from images (extracting image information to transform it into a low-dimensional array containing important information about the image)
VGGNet has two simple rules of thumb that need to be followed.
The input is a 224*224 RGB image, so the input size is 224x224x3
The total number of parameters is 138,000,000.Most of these parameters come from the fully connected layer.
The fully connected layer contains a total of 123,645,952 parameters.
from keras import layers from keras.models import Model, Sequential from functools import partial conv3 = partial(layers.Conv2D, kernel_size=3, strides=1, padding='same', activation='relu') def block(in_tensor, filters, n_convs): conv_block = in_tensor for _ in range(n_convs): conv_block = conv3(filters=filters)(conv_block) return conv_block def _vgg(in_shape=(227,227,3), n_classes=1000, opt='sgd', n_stages_per_blocks=[2, 2, 3, 3, 3]): in_layer = layers.Input(in_shape) block1 = block(in_layer, 64, n_stages_per_blocks[0]) pool1 = layers.MaxPool2D()(block1) block2 = block(pool1, 128, n_stages_per_blocks[1]) pool2 = layers.MaxPool2D()(block2) block3 = block(pool2, 256, n_stages_per_blocks[2]) pool3 = layers.MaxPool2D()(block3) block4 = block(pool3, 512, n_stages_per_blocks[3]) pool4 = layers.MaxPool2D()(block4) block5 = block(pool4, 512, n_stages_per_blocks[4]) pool5 = layers.MaxPool2D()(block5) flattened = layers.GlobalAvgPool2D()(pool5) dense1 = layers.Dense(4096, activation='relu')(flattened) dense2 = layers.Dense(4096, activation='relu')(dense1) preds = layers.Dense(1000, activation='softmax')(dense2) model = Model(in_layer, preds) model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"]) return model def vgg16(in_shape=(227,227,3), n_classes=1000, opt='sgd'): return _vgg(in_shape, n_classes, opt) def vgg19(in_shape=(227,227,3), n_classes=1000, opt='sgd'): return _vgg(in_shape, n_classes, opt, [2, 2, 4, 4, 4]) if __name__ == '__main__': model = vgg19() print(model.summary())
It uses an inception module, a novel concept with a smaller convolution that reduces the number of parameters to just 4 million.
Inception module
Use these Inception module the reasons for the:
GoogLeNet/Inception - Architecture
The complete inception architecture.
In-depth understanding of convolution
You may see some "auxiliary classifiers" with softmax in this structure. Quoting from this paper - "By adding auxiliary classifiers connected to these intermediate layers, we expect to enhance discrimination at lower stages of the classifier, increase the gradient signal being propagated back, and provide additional regularization. "
But what does that mean? What they mean is:
Auxiliary classifier structure.
Note: Here #1×1 stands for the filter in the 1×1 convolution in the Inception module. The #3×3 simplification (reduce) represents the filter in the 1×1 convolution before the 3×3 convolution in the Inception module. The #5×5 simplification (reduce) represents the filter in the 1×1 convolution before the 5×5 convolution in the Inception module. The #3×3 represents the filter in the 3×3 convolution in the Inception module. The #5×5 represents the filter in the 5×5 convolution in the Inception module. The pool item (pool proj) represents the filter in the 1×1 convolution before the largest pool in the inception module.
GoogLeNet is a typical Inception architecture
It uses batch normalization, image distortion, and RMSprop, which we will discuss in a future article.
The top-5 error rate in the 2015 imagenet challenge was around 3.57%, which is lower than the human top-5 error rate. This is all thanks to Microsoft's use of ResNet ( Residual Network) in the competition. This network proposes a completely new approach: "jump connections"
Residual learning: a module
Residual networks provide a solution to such a phenomenon - as we keep deepening the neural network, the performance of the deep neural network gets worse. But intuitively, it seems like something that shouldn't happen. If the performance of a network of depth K is measured in terms of y, then it is only right that a network of depth K+1 should perform at least as well as y.
This phenomenon brings up the hypothesis that direct mappings are hard to learn. So, instead of learning the mapping between the output and input layers of the network, the difference between them - the residuals - is learned.
For example, let x be the input and H(x) be the learned output. We have to learn that F(x) = H(x) - x. We can first use one layer to learn F(x) and then add x to F(x) to obtain H(x). As a result, we send H(x) to the next level, as we did before. This is the residual block we saw earlier.
The results are stunning, and this is because the gradient disappearance problem that caused the neural network to fail to learn is eliminated. A jump connection, or "shortcut", gives a shortcut to obtain the gradient of the previous layers of the network, skipping the layers in between.
Let's use it here.
In this paper, we propose deeper ResNets- 50/101/152 using bottlenecks. The neural network uses a 1×1 convolution to increase and decrease the dimensionality of the number of channels, rather than using the residual blocks mentioned above.
from keras import layers from keras.models import Model def _after_conv(in_tensor): norm = layers.BatchNormalization()(in_tensor) return layers.Activation('relu')(norm) def conv1(in_tensor, filters): conv = layers.Conv2D(filters, kernel_size=1, strides=1)(in_tensor) return _after_conv(conv) def conv1_downsample(in_tensor, filters): conv = layers.Conv2D(filters, kernel_size=1, strides=2)(in_tensor) return _after_conv(conv) def conv3(in_tensor, filters): conv = layers.Conv2D(filters, kernel_size=3, strides=1, padding='same')(in_tensor) return _after_conv(conv) def conv3_downsample(in_tensor, filters): conv = layers.Conv2D(filters, kernel_size=3, strides=2, padding='same')(in_tensor) return _after_conv(conv) def resnet_block_wo_bottlneck(in_tensor, filters, downsample=False): if downsample: conv1_rb = conv3_downsample(in_tensor, filters) else: conv1_rb = conv3(in_tensor, filters) conv2_rb = conv3(conv1_rb, filters) if downsample: in_tensor = conv1_downsample(in_tensor, filters) result = layers.Add()([conv2_rb, in_tensor]) return layers.Activation('relu')(result) def resnet_block_w_bottlneck(in_tensor, filters, downsample=False, change_channels=False): if downsample: conv1_rb = conv1_downsample(in_tensor, int(filters/4)) else: conv1_rb = conv1(in_tensor, int(filters/4)) conv2_rb = conv3(conv1_rb, int(filters/4)) conv3_rb = conv1(conv2_rb, filters) if downsample: in_tensor = conv1_downsample(in_tensor, filters) elif change_channels: in_tensor = conv1(in_tensor, filters) result = layers.Add()([conv3_rb, in_tensor]) return result def _pre_res_blocks(in_tensor): conv = layers.Conv2D(64, 7, strides=2, padding='same')(in_tensor) conv = _after_conv(conv) pool = layers.MaxPool2D(3, 2, padding='same')(conv) return pool def _post_res_blocks(in_tensor, n_classes): pool = layers.GlobalAvgPool2D()(in_tensor) preds = layers.Dense(n_classes, activation='softmax')(pool) return preds def convx_wo_bottleneck(in_tensor, filters, n_times, downsample_1=False): res = in_tensor for i in range(n_times): if i == 0: res = resnet_block_wo_bottlneck(res, filters, downsample_1) else: res = resnet_block_wo_bottlneck(res, filters) return res def convx_w_bottleneck(in_tensor, filters, n_times, downsample_1=False): res = in_tensor for i in range(n_times): if i == 0: res = resnet_block_w_bottlneck(res, filters, downsample_1, not downsample_1) else: res = resnet_block_w_bottlneck(res, filters) return res def _resnet(in_shape=(224,224,3), n_classes=1000, opt='sgd', convx=[64, 128, 256, 512], n_convx=[2, 2, 2, 2], convx_fn=convx_wo_bottleneck): in_layer = layers.Input(in_shape) downsampled = _pre_res_blocks(in_layer) conv2x = convx_fn(downsampled, convx[0], n_convx[0]) conv3x = convx_fn(conv2x, convx[1], n_convx[1], True) conv4x = convx_fn(conv3x, convx[2], n_convx[2], True) conv5x = convx_fn(conv4x, convx[3], n_convx[3], True) preds = _post_res_blocks(conv5x, n_classes) model = Model(in_layer, preds) model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"]) return model def resnet18(in_shape=(224,224,3), n_classes=1000, opt='sgd'): return _resnet(in_shape, n_classes, opt) def resnet34(in_shape=(224,224,3), n_classes=1000, opt='sgd'): return _resnet(in_shape, n_classes, opt, n_convx=[3, 4, 6, 3]) def resnet50(in_shape=(224,224,3), n_classes=1000, opt='sgd'): return _resnet(in_shape, n_classes, opt, [256, 512, 1024, 2048], [3, 4, 6, 3], convx_w_bottleneck) def resnet101(in_shape=(224,224,3), n_classes=1000, opt='sgd'): return _resnet(in_shape, n_classes, opt, [256, 512, 1024, 2048], [3, 4, 23, 3], convx_w_bottleneck) def resnet152(in_shape=(224,224,3), n_classes=1000, opt='sgd'): return _resnet(in_shape, n_classes, opt, [256, 512, 1024, 2048], [3, 8, 36, 3], convx_w_bottleneck) if __name__ == '__main__': model = resnet50() print(model.summary())
Want to continue to see links and references to that article?
Long click on the link to open or click on the bottom [read original].
https://ai.yanxishe.com/page/TextTranslation/1249