RefineDet is a paper from CVPR2018, before the mainstream target detection algorithms are mainly divided into single-stage and two-stage, the advantage of single-stage model is faster, the advantage of two-stage model is higher accuracy, a lot of contemporary work is around improving the accuracy of single-stage (as previously talked about in this public

focal loss

, solving the sample imbalance problem of single-stage), or reducing the time consumption of two-stage models (e.g., Lighthead RCNN). And the RefineDet proposed in this paper can be said to be a fusion of these two approaches, incorporating the idea of two-stage based on the single-stage model, and it can be said that there is no longer a clear dividing line between single-stage and two-stage.

RefineDet's network is divided into three main modules, the anchorrefinement module (ARM) and the objectdetection module (ODM), and of course the connection block between the two, the transfer connection block (TCB).

ARM: Essentially an SSD model, it actually functions like an RPN network, implementing binary classification of foreground and background, predicting foreground and background score scores (negative confidence score and positive confidencescore), and anchors with negative confidence scores above a preset threshold are discarded. The last ones left are the negatively hard refined anchor and the positively refined anchor. Solve the sample imbalance problem in subsequent networks with multiple classifications, and also predict the four offests of their coordinates, roughly adjusting the position and size of the anchor to provide better initialization for later regressions.

ODM: The network model is similar to DSSD, with the difference that the output of the input ARM network refinedanchors, solving the screening problem of the training samples, and the secondary regression based on the rough regression of the ARM network, this cascade regression also makes the regression of bbox more accurate.

TCB: TCB connects ARM and ODM for feature convergence. TCB also differs slightly from previous feature fusion methods in that it uses a deconvolution operation to achieve dimensional unification and then an ELTw Sum (also called broadcast add) operation to do additive operations on the shallow and deep feature maps on the corresponding channels. Previously Google TDM used the up-sample operation to achieve the unification of w and h dimensions, and then the concat operation to allow the shallow and deep feature maps to be stitched together in the channel number dimension, which is a more crude and less effective way of feature fusion. FPN is then operated by up-sample and then by Eltw Sum. DSSD uses the Eltw Product (also called broadcast mul) operation by first deconvolution operation and then Eltw Product operation to do dot product operation on the corresponding channel for both shallow and deep feature maps, Eltw Product operation is slightly better than Eltw Sum operation, but the multiplication operation is also more time consuming than the addition operation.

Loss function: directly sums the loss of both ARM and ODM components for end-to-end training. The authors also used some techniques in the training process, and interested readers can get details on this in the paper.

Welcome to follow, see you next time ~

1、Top 5 hackers in the world only one of them is Chinese and he is also very patriotic
2、It turns out that there are coral hall cheats seethrough Coral hall external hang auxiliary software universal version
3、2017 Creation Fair Results Matching Session in Guangzhou
4、360 Search Launches Super Intelligent Answer Tool to Help Players Grab Coin Wars
5、Artificial intelligence ecommerce still leading blockchain projects are on the rise shining and trending

    已推荐到看一看 和朋友分享想法
    最多200字,当前共 发送