II. Basic CNN architecture
As shown in the figure below， We use images as input。 Perform a series of convolutions+ Pooling operations， Then there are some fully connected layers。 If we perform multi-class classification the output issoftmax。 everyoneCNN have4 Basic building blocks： convolutional layer， activation function， battery compartment harmony full connection layer。
III. CNN Applications
CNN architectures continue to occupy a prominent position in the field of computer vision, with architectural advantages providing improvements in speed, accuracy and training for many of the applications and tasks mentioned below.
Target detection.CNN is the main architecture behind the most popular models， for exampleR-CNN， speedR-CNN， fasterR-CNN。 In these models， Network hypothesis object area， Then use the top of each of these regional proposalsCNN Classify them。 This is now the primary method for many object detection models， There are self-driving cars、 Intelligent Video Surveillance harmony Application directions such as facial detection。
Target tracking. CNNs have been widely used for visual tracking applications. For example, given that CNNs are pre-trained in large offline image repositories, this online visual tracking algorithm developed by a team at the Pohang Research Institute in Korea can learn discriminative images in order to visualize targets both spatially and locally. Another example is DeepTrack, a solution that automatically relearns the most useful feature representations during tracking in order to accurately adjust for appearance changes, pose and scale changes while preventing drift and tracking failures.
Object identification. A team from INRIA and MSR in France has developed a weakly supervised CNN that does not rely on image labels and can learn from cluttered scenes containing multiple objects. Another example is FV-CNN, a texture descriptor developed by the Oxfordians to solve the clutter problem in texture recognition.
Semantic segmentation. The Deep Resolution Network is a CNN-based network developed by a group of researchers in Hong Kong for integrating rich information into the image segmentation process. On the other hand, researchers at UC Berkeley have built fully convolutional networks and surpassed the latest techniques in the field of semantic segmentation. Recently SegNet is a deep fully convolutional neural network that is very efficient in terms of memory and computation time for semantic pixel segmentation.
Video and image captions. The most important invention is UC Berkeley's long-term periodic convolutional nets, which combine CNNs and RNNs (recurrent neural networks) to handle large-scale visual understanding tasks, including activity recognition, image captioning, and video description. It has been heavily deployed by YouTube's data science team to make sense of the massive amount of videos uploaded to the platform every day.