In recent years, deep learning can solve many computer vision problems better than traditional models. This is mainly due to the millions of image databases like ImageNet that greatly facilitate the training of deep neural networks. On such a large database, the artificially designed features in the traditional model are too shallow and do not necessarily classify the entire database effectively. However, deep neural networks can learn features in very different styles based on the class of the image against this database, which can be used for visual tasks such as classification. To better learn to have targeted features, we need to focus our attention on what interests us about the database. Don't overestimate my ability, I certainly can't browse every image in the world and mark it. But we can get the deep learning model to actively focus on places of interest. That's the attention mechanism I'm going to cover today.
The attention mechanism is actually quite simple, let's do a test, like the following picture :