Illustrated CNN Series I: Introduction to Convolutional Neural Networks


Convolutional neural networks are a classical network of deep learning neural networks that emerged in the late 1990s, but CNNs have been gaining impressive results in computer vision in recent years with the development of deep learning.

Convolutional neural networks are very similar to other neural networks: they are formed by neurons with parameters in the form of weights and deviations that can be learned. However, a differentiating feature of CNNs is that they explicitly assume that the entries are images, which allows us to encode certain properties in the architecture to identify specific elements in the image.

I. Historical background of CNN

The CNN rollout is largely due to the efforts of Yann LeCun, who is now the director of AI research at Facebook. A typical end-to-end image recognition system was built in the early 1990s by LeCun at Bell Labs, then one of the world's most prestigious research laboratories, to read and recognize handwritten numbers. In 1998 Leon Bottou, Patrick Haffner and Yoshua Bengio published a paper introducing convolutional nets and the complete end-to-end system they built. The first half of this post describes convolutional networks and the specific implementation process, and mentions everything else related to the technique (which I will cover in the CNN architecture section below). The second half shows how convolutional networks can be combined with language models. For example, when you read a text in English, you can build a system on top of English grammar to extract the most likely interpretations that are part of that language. Most importantly, you can build CNN systems and train them for both recognition and segmentation, and provide the correct input to the language model.

II. Basic CNN architecture

As shown in the figure below, We use images as input。 Perform a series of convolutions+ Pooling operations, Then there are some fully connected layers。 If we perform multi-class classification the output issoftmax。 everyoneCNN have4 Basic building blocks: convolutional layer, activation function, battery compartment harmony full connection layer。

III. CNN Applications

CNN architectures continue to occupy a prominent position in the field of computer vision, with architectural advantages providing improvements in speed, accuracy and training for many of the applications and tasks mentioned below.

Target detection.CNN is the main architecture behind the most popular models, for exampleR-CNN, speedR-CNN, fasterR-CNN。 In these models, Network hypothesis object area, Then use the top of each of these regional proposalsCNN Classify them。 This is now the primary method for many object detection models, There are self-driving cars、 Intelligent Video Surveillance harmony Application directions such as facial detection。

Target tracking. CNNs have been widely used for visual tracking applications. For example, given that CNNs are pre-trained in large offline image repositories, this online visual tracking algorithm developed by a team at the Pohang Research Institute in Korea can learn discriminative images in order to visualize targets both spatially and locally. Another example is DeepTrack, a solution that automatically relearns the most useful feature representations during tracking in order to accurately adjust for appearance changes, pose and scale changes while preventing drift and tracking failures.

Object identification. A team from INRIA and MSR in France has developed a weakly supervised CNN that does not rely on image labels and can learn from cluttered scenes containing multiple objects. Another example is FV-CNN, a texture descriptor developed by the Oxfordians to solve the clutter problem in texture recognition.

Semantic segmentation. The Deep Resolution Network is a CNN-based network developed by a group of researchers in Hong Kong for integrating rich information into the image segmentation process. On the other hand, researchers at UC Berkeley have built fully convolutional networks and surpassed the latest techniques in the field of semantic segmentation. Recently SegNet is a deep fully convolutional neural network that is very efficient in terms of memory and computation time for semantic pixel segmentation.

Video and image captions. The most important invention is UC Berkeley's long-term periodic convolutional nets, which combine CNNs and RNNs (recurrent neural networks) to handle large-scale visual understanding tasks, including activity recognition, image captioning, and video description. It has been heavily deployed by YouTube's data science team to make sense of the massive amount of videos uploaded to the platform every day.

CNN has also found many novel applications outside of Vision, notably natural language processing harmony speech recognition

natural language processing In the field of machine translation,Facebook ofAI The research team usedCNN to achieve state-of-the-art accuracy, Speed is the recurring neurological9 times (multiplier)。 In the field of sentence classification, New York University'sYoon Kim Tested.CNN, suchCNN A sentence-level classification task was performed on top of a pre-trained word vector, and in7 of the mandate4 The latest technology has been improved in item。 In the Q&A environment, From Waterloo harmony Some researchers in Maryland have exploredCNN Validity of response selection in end-to-end Q&A。 They foundCNN The answer is better than the previous algorithm。

speech recognition CNNs are very effective models for reducing spectral variation and establishing spectral correlation in acoustic features for automatic speech recognition. Hybrid speech recognition systems combining CNNs with Hidden Markov Models/Gaussian Hybrid Models have obtained state-of-the-art results in various benchmark tests. By combining hierarchical CNNs with CTC (connectionist temporal classification), researchers at the University of Montreal have proposed an end-to-end speech framework for sequence tagging that is competitive with existing baseline systems. Microsoft's team used CNNs to reduce the error rate in speech recognition performance, specifically by building CNN architectures with local connectivity, weight sharing and pooling. Their model is able to remain invariant to speaker and environmental changes.


Convolutional neural networks have played an important role in the development of deep learning. Compared to most other neural networks, CNNs perform very well in commercial applications of deep learning (vision, language, speech). They have been used by many machine learning practitioners to win academic and industry competitions. Research into CNN architectures is evolving at such a fast pace: using fewer weights/parameters, automatically learning and promoting features of input objects, object position invariance and image/text/speech distortion, etc., there is no doubt that CNNs are the most popular neural network technology and a must-know for anyone wanting to enter the field of deep learning.

1、Just got back from an interview with Ali and got an offer to share with you Ali interview experience
2、Do you really know how to use Jupyter Here are 7 advanced features to help you double your efficiency
3、CBInsights Q1 Report China accounts for 4 of the worlds 5 largest funding rounds AI startups absorb 19 billion
4、MyISAM versus InnoDB
5、NLP Natural Language Processing Literacy

    已推荐到看一看 和朋友分享想法
    最多200字,当前共 发送