人工智能视频教程 ai vip技术 人工智能数学基础 爬虫 python机器学习 tensorflow深度学习 20+个企业AI实战项目

 找回密码
 立即注册

QQ登录

只需一步,快速开始

搜索
热搜: 活动 交友 discuz
查看: 123|回复: 0

[课堂笔记] faster rcnn网络结构详解

[复制链接]

803

主题

1003

帖子

9984

积分

管理员

Rank: 10Rank: 10Rank: 10

积分
9984
QQ
发表于 2019-9-8 09:32:43 | 显示全部楼层 |阅读模式


faster rcnn网络结构详解


caffe网络在线可视化工具(需要翻墙):http://ethereon.github.io/netscope/#/editor


  1. name: "VGG_ILSVRC_16_layers"

  2. input: "data"
  3. input_shape {
  4.   dim: 1
  5.   dim: 3
  6.   dim: 600
  7.   dim: 989
  8. }

  9. input: "im_info"
  10. input_shape {
  11.   dim: 1
  12.   dim: 3
  13. }

  14. layer {
  15.   name: "conv1_1"
  16.   type: "Convolution"
  17.   bottom: "data"
  18.   top: "conv1_1"
  19.   convolution_param {
  20.     num_output: 64
  21.     pad: 1 kernel_size: 3
  22.   }
  23. }
  24. layer {
  25.   name: "relu1_1"
  26.   type: "ReLU"
  27.   bottom: "conv1_1"
  28.   top: "conv1_1"
  29. }
  30. layer {
  31.   name: "conv1_2"
  32.   type: "Convolution"
  33.   bottom: "conv1_1"
  34.   top: "conv1_2"
  35.   convolution_param {
  36.     num_output: 64
  37.     pad: 1 kernel_size: 3
  38.   }
  39. }
  40. layer {
  41.   name: "relu1_2"
  42.   type: "ReLU"
  43.   bottom: "conv1_2"
  44.   top: "conv1_2"
  45. }
  46. layer {
  47.   name: "pool1"
  48.   type: "Pooling"
  49.   bottom: "conv1_2"
  50.   top: "pool1"
  51.   pooling_param {
  52.     pool: MAX
  53.     kernel_size: 2 stride: 2
  54.   }
  55. }
  56. layer {
  57.   name: "conv2_1"
  58.   type: "Convolution"
  59.   bottom: "pool1"
  60.   top: "conv2_1"
  61.   convolution_param {
  62.     num_output: 128
  63.     pad: 1 kernel_size: 3
  64.   }
  65. }
  66. layer {
  67.   name: "relu2_1"
  68.   type: "ReLU"
  69.   bottom: "conv2_1"
  70.   top: "conv2_1"
  71. }
  72. layer {
  73.   name: "conv2_2"
  74.   type: "Convolution"
  75.   bottom: "conv2_1"
  76.   top: "conv2_2"
  77.   convolution_param {
  78.     num_output: 128
  79.     pad: 1 kernel_size: 3
  80.   }
  81. }
  82. layer {
  83.   name: "relu2_2"
  84.   type: "ReLU"
  85.   bottom: "conv2_2"
  86.   top: "conv2_2"
  87. }
  88. layer {
  89.   name: "pool2"
  90.   type: "Pooling"
  91.   bottom: "conv2_2"
  92.   top: "pool2"
  93.   pooling_param {
  94.     pool: MAX
  95.     kernel_size: 2 stride: 2
  96.   }
  97. }
  98. layer {
  99.   name: "conv3_1"
  100.   type: "Convolution"
  101.   bottom: "pool2"
  102.   top: "conv3_1"
  103.   convolution_param {
  104.     num_output: 256
  105.     pad: 1 kernel_size: 3
  106.   }
  107. }
  108. layer {
  109.   name: "relu3_1"
  110.   type: "ReLU"
  111.   bottom: "conv3_1"
  112.   top: "conv3_1"
  113. }
  114. layer {
  115.   name: "conv3_2"
  116.   type: "Convolution"
  117.   bottom: "conv3_1"
  118.   top: "conv3_2"
  119.   convolution_param {
  120.     num_output: 256
  121.     pad: 1 kernel_size: 3
  122.   }
  123. }
  124. layer {
  125.   name: "relu3_2"
  126.   type: "ReLU"
  127.   bottom: "conv3_2"
  128.   top: "conv3_2"
  129. }
  130. layer {
  131.   name: "conv3_3"
  132.   type: "Convolution"
  133.   bottom: "conv3_2"
  134.   top: "conv3_3"
  135.   convolution_param {
  136.     num_output: 256
  137.     pad: 1 kernel_size: 3
  138.   }
  139. }
  140. layer {
  141.   name: "relu3_3"
  142.   type: "ReLU"
  143.   bottom: "conv3_3"
  144.   top: "conv3_3"
  145. }
  146. layer {
  147.   name: "pool3"
  148.   type: "Pooling"
  149.   bottom: "conv3_3"
  150.   top: "pool3"
  151.   pooling_param {
  152.     pool: MAX
  153.     kernel_size: 2 stride: 2
  154.   }
  155. }
  156. layer {
  157.   name: "conv4_1"
  158.   type: "Convolution"
  159.   bottom: "pool3"
  160.   top: "conv4_1"
  161.   convolution_param {
  162.     num_output: 512
  163.     pad: 1 kernel_size: 3
  164.   }
  165. }
  166. layer {
  167.   name: "relu4_1"
  168.   type: "ReLU"
  169.   bottom: "conv4_1"
  170.   top: "conv4_1"
  171. }
  172. layer {
  173.   name: "conv4_2"
  174.   type: "Convolution"
  175.   bottom: "conv4_1"
  176.   top: "conv4_2"
  177.   convolution_param {
  178.     num_output: 512
  179.     pad: 1 kernel_size: 3
  180.   }
  181. }
  182. layer {
  183.   name: "relu4_2"
  184.   type: "ReLU"
  185.   bottom: "conv4_2"
  186.   top: "conv4_2"
  187. }
  188. layer {
  189.   name: "conv4_3"
  190.   type: "Convolution"
  191.   bottom: "conv4_2"
  192.   top: "conv4_3"
  193.   convolution_param {
  194.     num_output: 512
  195.     pad: 1 kernel_size: 3
  196.   }
  197. }
  198. layer {
  199.   name: "relu4_3"
  200.   type: "ReLU"
  201.   bottom: "conv4_3"
  202.   top: "conv4_3"
  203. }
  204. layer {
  205.   name: "pool4"
  206.   type: "Pooling"
  207.   bottom: "conv4_3"
  208.   top: "pool4"
  209.   pooling_param {
  210.     pool: MAX
  211.     kernel_size: 2 stride: 2
  212.   }
  213. }
  214. layer {
  215.   name: "conv5_1"
  216.   type: "Convolution"
  217.   bottom: "pool4"
  218.   top: "conv5_1"
  219.   convolution_param {
  220.     num_output: 512
  221.     pad: 1 kernel_size: 3
  222.   }
  223. }
  224. layer {
  225.   name: "relu5_1"
  226.   type: "ReLU"
  227.   bottom: "conv5_1"
  228.   top: "conv5_1"
  229. }
  230. layer {
  231.   name: "conv5_2"
  232.   type: "Convolution"
  233.   bottom: "conv5_1"
  234.   top: "conv5_2"
  235.   convolution_param {
  236.     num_output: 512
  237.     pad: 1 kernel_size: 3
  238.   }
  239. }
  240. layer {
  241.   name: "relu5_2"
  242.   type: "ReLU"
  243.   bottom: "conv5_2"
  244.   top: "conv5_2"
  245. }
  246. layer {
  247.   name: "conv5_3"
  248.   type: "Convolution"
  249.   bottom: "conv5_2"
  250.   top: "conv5_3"
  251.   convolution_param {
  252.     num_output: 512
  253.     pad: 1 kernel_size: 3
  254.   }
  255. }
  256. layer {
  257.   name: "relu5_3"
  258.   type: "ReLU"
  259.   bottom: "conv5_3"
  260.   top: "conv5_3"
  261. }

  262. #========= RPN ============

  263. layer {
  264.   name: "rpn_conv/3x3"
  265.   type: "Convolution"
  266.   bottom: "conv5_3"
  267.   top: "rpn/output"
  268.   convolution_param {
  269.     num_output: 512
  270.     kernel_size: 3 pad: 1 stride: 1
  271.   }
  272. }
  273. layer {
  274.   name: "rpn_relu/3x3"
  275.   type: "ReLU"
  276.   bottom: "rpn/output"
  277.   top: "rpn/output"
  278. }

  279. layer {
  280.   name: "rpn_cls_score"
  281.   type: "Convolution"
  282.   bottom: "rpn/output"
  283.   top: "rpn_cls_score"
  284.   convolution_param {
  285.     num_output: 18   # 2(bg/fg) * 9(anchors)
  286.     kernel_size: 1 pad: 0 stride: 1
  287.   }
  288. }
  289. layer {
  290.   name: "rpn_bbox_pred"
  291.   type: "Convolution"
  292.   bottom: "rpn/output"
  293.   top: "rpn_bbox_pred"
  294.   convolution_param {
  295.     num_output: 36   # 4 * 9(anchors)
  296.     kernel_size: 1 pad: 0 stride: 1
  297.   }
  298. }
  299. layer {
  300.    bottom: "rpn_cls_score"
  301.    top: "rpn_cls_score_reshape"
  302.    name: "rpn_cls_score_reshape"
  303.    type: "Reshape"
  304.    reshape_param { shape { dim: 0 dim: 2 dim: -1 dim: 0 } }
  305. }

  306. #========= RoI Proposal ============

  307. layer {
  308.   name: "rpn_cls_prob"
  309.   type: "Softmax"
  310.   bottom: "rpn_cls_score_reshape"
  311.   top: "rpn_cls_prob"
  312. }
  313. layer {
  314.   name: 'rpn_cls_prob_reshape'
  315.   type: 'Reshape'
  316.   bottom: 'rpn_cls_prob'
  317.   top: 'rpn_cls_prob_reshape'
  318.   reshape_param { shape { dim: 0 dim: 18 dim: -1 dim: 0 } }
  319. }
  320. layer {
  321.   name: 'proposal'
  322.   type: 'Python'
  323.   bottom: 'rpn_cls_prob_reshape'
  324.   bottom: 'rpn_bbox_pred'
  325.   bottom: 'im_info'
  326.   top: 'rois'
  327.   python_param {
  328.     module: 'rpn.proposal_layer'
  329.     layer: 'ProposalLayer'
  330.     param_str: "'feat_stride': 16"
  331.   }
  332. }

  333. #========= RCNN ============

  334. layer {
  335.   name: "roi_pool5"
  336.   type: "ROIPooling"
  337.   bottom: "conv5_3"
  338.   bottom: "rois"
  339.   top: "pool5"
  340.   roi_pooling_param {
  341.     pooled_w: 7
  342.     pooled_h: 7
  343.     spatial_scale: 0.0625 # 1/16
  344.   }
  345. }
  346. layer {
  347.   name: "fc6"
  348.   type: "InnerProduct"
  349.   bottom: "pool5"
  350.   top: "fc6"
  351.   inner_product_param {
  352.     num_output: 4096
  353.   }
  354. }
  355. layer {
  356.   name: "relu6"
  357.   type: "ReLU"
  358.   bottom: "fc6"
  359.   top: "fc6"
  360. }
  361. layer {
  362.   name: "fc7"
  363.   type: "InnerProduct"
  364.   bottom: "fc6"
  365.   top: "fc7"
  366.   inner_product_param {
  367.     num_output: 4096
  368.   }
  369. }
  370. layer {
  371.   name: "relu7"
  372.   type: "ReLU"
  373.   bottom: "fc7"
  374.   top: "fc7"
  375. }
  376. layer {
  377.   name: "cls_score"
  378.   type: "InnerProduct"
  379.   bottom: "fc7"
  380.   top: "cls_score"
  381.   inner_product_param {
  382.     num_output: 21
  383.   }
  384. }
  385. layer {
  386.   name: "bbox_pred"
  387.   type: "InnerProduct"
  388.   bottom: "fc7"
  389.   top: "bbox_pred"
  390.   inner_product_param {
  391.     num_output: 84
  392.   }
  393. }
  394. layer {
  395.   name: "cls_prob"
  396.   type: "Softmax"
  397.   bottom: "cls_score"
  398.   top: "cls_prob"
  399. }
复制代码

层名+shape:data        (1, 3, 600, 989)
层名+shape:im_info        (1, 3)
层名+shape:conv1_1        (1, 64, 600, 989)
层名+shape:conv1_2        (1, 64, 600, 989)
层名+shape:pool1        (1, 64, 300, 495)
层名+shape:conv2_1        (1, 128, 300, 495)
层名+shape:conv2_2        (1, 128, 300, 495)
层名+shape:pool2        (1, 128, 150, 248)
层名+shape:conv3_1        (1, 256, 150, 248)
层名+shape:conv3_2        (1, 256, 150, 248)
层名+shape:conv3_3        (1, 256, 150, 248)
层名+shape:pool3        (1, 256, 75, 124)
层名+shape:conv4_1        (1, 512, 75, 124)
层名+shape:conv4_2        (1, 512, 75, 124)
层名+shape:conv4_3        (1, 512, 75, 124)
层名+shape:pool4        (1, 512, 38, 62)
层名+shape:conv5_1        (1, 512, 38, 62)
层名+shape:conv5_2        (1, 512, 38, 62)
层名+shape:conv5_3        (1, 512, 38, 62)
层名+shape:conv5_3_relu5_3_0_split_0        (1, 512, 38, 62)
层名+shape:conv5_3_relu5_3_0_split_1        (1, 512, 38, 62)
层名+shape:rpn/output        (1, 512, 38, 62)
层名+shape:rpn/output_rpn_relu/3x3_0_split_0        (1, 512, 38, 62)
层名+shape:rpn/output_rpn_relu/3x3_0_split_1        (1, 512, 38, 62)
层名+shape:rpn_cls_score        (1, 18, 38, 62)
层名+shape:rpn_bbox_pred        (1, 36, 38, 62)
层名+shape:rpn_cls_score_reshape        (1, 2, 342, 62)
层名+shape:rpn_cls_prob        (1, 2, 342, 62)
层名+shape:rpn_cls_prob_reshape        (1, 18, 38, 62)
层名+shape:rois        (300, 5)
层名+shape:pool5        (300, 512, 7, 7)
层名+shape:fc6        (300, 4096)
层名+shape:fc7        (300, 4096)
层名+shape:fc7_relu7_0_split_0        (300, 4096)
层名+shape:fc7_relu7_0_split_1        (300, 4096)
层名+shape:cls_score        (300, 21)
层名+shape:bbox_pred        (300, 84)
层名+shape:cls_prob        (300, 21)




层名+网络W与b:conv1_1        (64, 3, 3, 3) (64,)
层名+网络W与b:conv1_2        (64, 64, 3, 3) (64,)
层名+网络W与b:conv2_1        (128, 64, 3, 3) (128,)
层名+网络W与b:conv2_2        (128, 128, 3, 3) (128,)
层名+网络W与b:conv3_1        (256, 128, 3, 3) (256,)
层名+网络W与b:conv3_2        (256, 256, 3, 3) (256,)
层名+网络W与b:conv3_3        (256, 256, 3, 3) (256,)
层名+网络W与b:conv4_1        (512, 256, 3, 3) (512,)
层名+网络W与b:conv4_2        (512, 512, 3, 3) (512,)
层名+网络W与b:conv4_3        (512, 512, 3, 3) (512,)
层名+网络W与b:conv5_1        (512, 512, 3, 3) (512,)
层名+网络W与b:conv5_2        (512, 512, 3, 3) (512,)
层名+网络W与b:conv5_3        (512, 512, 3, 3) (512,)
层名+网络W与b:rpn_conv/3x3        (512, 512, 3, 3) (512,)
层名+网络W与b:rpn_cls_score        (18, 512, 1, 1) (18,)
层名+网络W与b:rpn_bbox_pred        (36, 512, 1, 1) (36,)
层名+网络W与b:fc6        (4096, 25088) (4096,)
层名+网络W与b:fc7        (4096, 4096) (4096,)
层名+网络W与b:cls_score        (21, 4096) (21,)
层名+网络W与b:bbox_pred        (84, 4096) (84,)
前向计算结果blobs_out:
['bbox_pred', 'cls_prob'] (300, 84) (300, 21)


关键层的分析
layer {
  name: 'proposal'
  type: 'Python'
  bottom: 'rpn_cls_prob_reshape'
  bottom: 'rpn_bbox_pred'
  bottom: 'im_info'
  top: 'rois'
  python_param {
    module: 'rpn.proposal_layer'
    layer: 'ProposalLayer'
    param_str: "'feat_stride': 16"
  }
}
输入是:
rpn_cls_prob_reshape        (1, 18, 38, 62)
rpn_bbox_pred        (1, 36, 38, 62)
im_info        (1, 3)
输出是:
rois        (300, 5)

class ProposalLayer(caffe.Layer):里面有生成9种anchor box

generate_anchors(base_size=16, ratios=[0.5, 1, 2],
                     scales=np.array([8, 16, 32]))

ratios就是高宽比H/W(或 长宽比)   :
    # ws 宽度是缩放后面积的开方 因为不一定是矩阵
    ws = np.round(np.sqrt(size_ratios))
    # 高度=宽度×缩放比例 ratios就是高宽比  np.round是四舍五入
    hs = np.round(ws * ratios)
scales是缩放比例:
ws = w * scales
hs = h * scales

layer {
  name: "roi_pool5"
  type: "ROIPooling"
  bottom: "conv5_3"
  bottom: "rois"
  top: "pool5"
  roi_pooling_param {
    pooled_w: 7
    pooled_h: 7
    spatial_scale: 0.0625 # 1/16
  }
}
输入:
conv5_3        (1, 512, 38, 62)
rois        (300, 5)
输出:
pool5        (300, 512, 7, 7)
ROI pooling总结:
(1)用于目标检测任务;
(2)允许我们对CNN中的feature map进行reuse;
(3)可以显著加速training和testing速度;
(4)允许end-to-end的形式训练目标检测系统。








faster rcnn网络结构01.png
faster rcnn网络结构02.png
让天下人人学会人工智能!人工智能的前景一片大好!
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|Archiver|手机版|小黑屋|人工智能工程师的摇篮 ( 湘ICP备18018285号-1 )

GMT+8, 2019-9-21 00:54 , Processed in 0.221564 second(s), 22 queries .

Powered by Discuz! X3.4

© 2001-2017 Comsenz Inc.

快速回复 返回顶部 返回列表