东方耀AI技术分享
标题:
先进驾驶辅助系统ADAS业务实战项目总结
[打印本页]
作者:
东方耀
时间:
2019-9-4 20:36
标题:
先进驾驶辅助系统ADAS业务实战项目总结
ADAS业务场景综述
先进驾驶辅助系统( Advanced Driver Assistance System)
简称ADAS,是利用安装于车上的各式各样的传感器,在第一时
间收集车内外的环境数据,进行静、动态物体的辨识、侦测与追
踪等技术上的处理,从而能够让驾驶者在最快的时间察觉可能发
生的危险,以引起注意和提高安全性的主动安全技术。
ADAS业务的研究方向:
导航与实时交通系统TMC
自适应灯光控制
电子警察系统
行人保护系统
车联网
自动泊车系统
自适应巡航ACC
交通标志识别
盲点探测
车道偏移报警系统
驾驶员疲劳探测
车道保持系统
下坡控制系统
碰撞避免或预碰撞系统
电动汽车报警系统
夜视系统
主要任务:检测车载视频数据中的机动车、非机动车、行人、交通标识符
标准的目标检测问题
判断算法性能好坏:
1、检出率 误报率
每一个标记只允许有一个检测与之相对应
重复检测会被视为错误检测
2、AP和mAP
ADAS业务场景的数据集:
Kitti数据集 MOT数据集 Berkeley大规模自动驾驶视频数据集
机动车、非机动车、行人检测问题难点(户外的):
1、阴天、雨天、夜间目标检测问题
2、拥挤场景下的目标检测问题
3、行人刚性运动带来的检测难题
4、小目标检测问题
5、遮挡问题 等等
KiTTi数据集(有图展示)
下载链接:
http://www.cvlibs.net/datasets/k ... hp?obj_benchmark=2d
KiTTi数据集由德国卡尔斯鲁厄理工学院和丰田美国技术研究院联合创办,
是目前国际上知名的自动驾驶场景下的计算机视觉算法评测数据集。
该数据集用于评测立体图像,光流,视觉测距,3D物体检测和3D跟踪等计算机视觉技术在车载环境下的性能。
Kitti包含市区、乡村和高速公路等场景采集的真实图像数据,每张图像中最多达15辆车和30个行人,还有各种程度的遮挡与截断。
整个数据集由389对立体图像和光流图,39.2km视觉测距序列以及超过200k3D标注物体的图像组成,以10Hz的频率采样及同步。
查看Kitti数据集中标记文件目标框的所有类别:
# -*- coding: utf-8 -*-
__author__ = u'东方耀 微信:dfy_88888'
__date__ = '2019/8/9 下午5:19'
__product__ = 'PyCharm'
__filename__ = 'demo_category'
import glob
list_anno_files = glob.glob('/home/dfy888/DataSets/Kitti_voc/training/label_2/*')
# print list_anno_files
# 7481
print len(list_anno_files)
category_list = []
for file_path in list_anno_files:
with open(file_path) as f:
anno_infos = f.readlines()
# print anno_infos
for anno_item in anno_infos:
category_list.append(anno_item.split(' ')[0])
print '查看Kitti数据集中标记文件目标框的所有类别:'
# ['Cyclist', 'Van', 'Tram', 'Car', 'Misc', 'Pedestrian', 'Truck', 'Person_sitting', 'DontCare']
# 后续打包数据的时候会将Misc杂项 DontCare不关心 去掉 不会写入voc格式的xml文件中
print set(category_list)
print len(category_list)
复制代码
Kitti数据集转换为VOC格式数据集:
其他的标注信息:可以考虑截断和遮挡 比如遮挡严重的目标框可以过滤掉 保证样本质量更好一些
但是在模型预测的时候会对遮挡严重的目标产生漏检 这个需要自己权衡
# -*- coding: utf-8 -*-
__author__ = u'东方耀 微信:dfy_88888'
__date__ = '2019/7/15 下午3:23'
__product__ = 'PyCharm'
__filename__ = 'kitti2voc'
import cv2
import glob
from xml.dom.minidom import Document
list_anno_files = glob.glob('/home/dfy888/DataSets/Kitti_voc/training/label_2/*')
def writexml(filename, saveimg, bboxes, xmlpath, typename):
"""
写成voc格式通用的xml文件
:param filename: 图片的路径
:param saveimg: 图片对象 cv2
:param bboxes: 多个人脸框集合
:param xmlpath: xml文件路径
:return:
"""
doc = Document()
# 根节点
annotation = doc.createElement('annotation')
doc.appendChild(annotation)
folder = doc.createElement('folder')
# 注意:widerface_voc voc格式数据的文件夹名字
folder_name = doc.createTextNode('Kitti_voc')
folder.appendChild(folder_name)
annotation.appendChild(folder)
filenamenode = doc.createElement('filename')
filename_name = doc.createTextNode(filename)
filenamenode.appendChild(filename_name)
annotation.appendChild(filenamenode)
source = doc.createElement('source')
annotation.appendChild(source)
database = doc.createElement('database')
database.appendChild(doc.createTextNode('Kitti Database'))
source.appendChild(database)
annotation_s = doc.createElement('annotation')
annotation_s.appendChild(doc.createTextNode('PASCAL VOC2007'))
source.appendChild(annotation_s)
image = doc.createElement('image')
image.appendChild(doc.createTextNode('flickr'))
source.appendChild(image)
flickrid = doc.createElement('flickrid')
flickrid.appendChild(doc.createTextNode('-1'))
source.appendChild(flickrid)
owner = doc.createElement('owner')
annotation.appendChild(owner)
flickrid_o = doc.createElement('flickrid')
flickrid_o.appendChild(doc.createTextNode('dfy_88888'))
owner.appendChild(flickrid_o)
name_o = doc.createElement('name')
name_o.appendChild(doc.createTextNode('dfy_88888'))
owner.appendChild(name_o)
size = doc.createElement('size')
annotation.appendChild(size)
width = doc.createElement('width')
width.appendChild(doc.createTextNode(str(saveimg.shape[1])))
height = doc.createElement('height')
height.appendChild(doc.createTextNode(str(saveimg.shape[0])))
depth = doc.createElement('depth')
depth.appendChild(doc.createTextNode(str(saveimg.shape[2])))
size.appendChild(width)
size.appendChild(height)
size.appendChild(depth)
segmented = doc.createElement('segmented')
segmented.appendChild(doc.createTextNode('0'))
annotation.appendChild(segmented)
for i in range(len(bboxes)):
# bbox 四维向量: [左上角坐标x y 宽高 w h]
bbox = bboxes[i]
objects = doc.createElement('object')
annotation.appendChild(objects)
object_name = doc.createElement('name')
# 不是只有人脸 adas业务场景下 行人 车辆 交通标示
object_name.appendChild(doc.createTextNode(typename[i]))
objects.appendChild(object_name)
pose = doc.createElement('pose')
pose.appendChild(doc.createTextNode('Unspecified'))
objects.appendChild(pose)
truncated = doc.createElement('truncated')
truncated.appendChild(doc.createTextNode('1'))
objects.appendChild(truncated)
difficult = doc.createElement('difficult')
difficult.appendChild(doc.createTextNode('0'))
objects.appendChild(difficult)
bndbox = doc.createElement('bndbox')
objects.appendChild(bndbox)
# xmin ymin 就是标记框 左上角的坐标
xmin = doc.createElement('xmin')
xmin.appendChild(doc.createTextNode(str(bbox[0])))
bndbox.appendChild(xmin)
ymin = doc.createElement('ymin')
ymin.appendChild(doc.createTextNode(str(bbox[1])))
bndbox.appendChild(ymin)
# xmax ymax 就是标记框 右下角的坐标
xmax = doc.createElement('xmax')
xmax.appendChild(doc.createTextNode(str(bbox[2])))
bndbox.appendChild(xmax)
ymax = doc.createElement('ymax')
ymax.appendChild(doc.createTextNode(str(bbox[3])))
bndbox.appendChild(ymax)
with open(xmlpath, 'w') as f:
f.write(doc.toprettyxml(indent=''))
# 转换数据集(Kitti---> VOC)
trainval = open('/home/dfy888/DataSets/Kitti_voc/ImageSets/Main/trainval.txt', 'w')
train = open('/home/dfy888/DataSets/Kitti_voc/ImageSets/Main/train.txt', 'w')
val = open('/home/dfy888/DataSets/Kitti_voc/ImageSets/Main/val.txt', 'w')
test = open('/home/dfy888/DataSets/Kitti_voc/ImageSets/Main/test.txt', 'w')
index = 0
# 7481
for file_path in list_anno_files:
with open(file_path) as f:
# 每一个标注文件txt格式的
anno_infos = f.readlines()
# print anno_infos
bboxes = []
typename = []
for anno_item in anno_infos:
# 对每一行的信息进行解析
anno_item_infos = anno_item.split()
if anno_item_infos[0] == 'Misc' or anno_item_infos[0] == 'DontCare':
# 将杂项与不关心的过滤掉 模型训练更容易一些
continue
else:
typename.append(anno_item_infos[0])
bbox = (int(float(anno_item_infos[4])), int(float(anno_item_infos[5])),
int(float(anno_item_infos[6])), int(float(anno_item_infos[7])))
bboxes.append(bbox)
pass
filename = file_path.split('/')[-1].replace('txt', 'png')
xmlpath = '/home/dfy888/DataSets/Kitti_voc/Annotations/' + filename.replace('png', 'xml')
img_path = '/home/dfy888/DataSets/Kitti_voc/JPEGImages/' + filename
saveimg = cv2.imread(img_path)
writexml(filename, saveimg, bboxes, xmlpath, typename)
# :param img_set_type: trainval or val or test or train
# trainval 90% test 10%
# train 70% val 20%
if index > len(list_anno_files) * 0.9:
test.write(filename.replace('.png', '\n'))
else:
trainval.write(filename.replace('.png', '\n'))
if index > len(list_anno_files) * 0.7:
val.write(filename.replace('.png', '\n'))
else:
train.write(filename.replace('.png', '\n'))
print '正在处理:' + str(index)
index += 1
train.close()
trainval.close()
test.close()
val.close()
复制代码
Faster RCNN检测模型的环境搭建:
https://github.com/rbgirshick/py-faster-rcnn
git clone
--recursive
https://github.com/rbgirshick/py-faster-rcnn.git
加入--recursive保证项目里面的caffe-fast-rcnn的下载 当然这里也有一个坑 就是
之前的caffe版本过低
具体解决请看:
http://www.ai111.vip/thread-788-1-1.html
Faster RCNN检测模型的测试:
python tools/demo_detector_dfy.py
# -*- coding: utf-8 -*-
__author__ = u'东方耀 微信:dfy_88888'
__date__ = '2019/9/8 上午10:48'
__product__ = 'PyCharm'
__filename__ = 'demo_detector_dfy.py'
"""
Demo script showing detections in sample images.
东方修改的:利用faster_rcnn vgg16的预训练模型进行目标检测demo
"""
import _init_paths
from fast_rcnn.config import cfg
from nms.gpu_nms import gpu_nms
from nms.cpu_nms import cpu_nms
import time
import matplotlib.pyplot as plt
import numpy as np
import caffe
import os
import cv2
# pascal voc 共21类 (含背景)
CLASSES = ('__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor')
class Timer(object):
"""A simple timer."""
def __init__(self):
self.total_time = 0.
self.calls = 0
self.start_time = 0.
self.diff = 0.
self.average_time = 0.
def tic(self):
# using time.time instead of time.clock because time time.clock
# does not normalize for multithreading
self.start_time = time.time()
def toc(self, average=True):
self.diff = time.time() - self.start_time
self.total_time += self.diff
self.calls += 1
self.average_time = self.total_time / self.calls
if average:
return self.average_time
else:
return self.diff
def nms(dets, thresh, force_cpu=False):
"""Dispatch to either CPU or GPU NMS implementations."""
if dets.shape[0] == 0:
return []
if cfg.USE_GPU_NMS and not force_cpu:
return gpu_nms(dets, thresh, device_id=cfg.GPU_ID)
else:
return cpu_nms(dets, thresh)
def bbox_transform_inv(boxes, deltas):
if boxes.shape[0] == 0:
return np.zeros((0, deltas.shape[1]), dtype=deltas.dtype)
boxes = boxes.astype(deltas.dtype, copy=False)
widths = boxes[:, 2] - boxes[:, 0] + 1.0
heights = boxes[:, 3] - boxes[:, 1] + 1.0
ctr_x = boxes[:, 0] + 0.5 * widths
ctr_y = boxes[:, 1] + 0.5 * heights
dx = deltas[:, 0::4]
dy = deltas[:, 1::4]
dw = deltas[:, 2::4]
dh = deltas[:, 3::4]
pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis]
pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis]
pred_w = np.exp(dw) * widths[:, np.newaxis]
pred_h = np.exp(dh) * heights[:, np.newaxis]
pred_boxes = np.zeros(deltas.shape, dtype=deltas.dtype)
# x1
pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
# y1
pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
# x2
pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w
# y2
pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h
return pred_boxes
def clip_boxes(boxes, im_shape):
"""
Clip boxes to image boundaries.
"""
# x1 >= 0
boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], im_shape[1] - 1), 0)
# y1 >= 0
boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], im_shape[0] - 1), 0)
# x2 < im_shape[1]
boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], im_shape[1] - 1), 0)
# y2 < im_shape[0]
boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], im_shape[0] - 1), 0)
return boxes
def im_list_to_blob(ims):
"""Convert a list of images into a network input.
ims:一个列表 里面都是图片(cv2.resize之后的放进去的)
Assumes images are already prepared (means subtracted, BGR order, ...).
"""
# max_shape :能满足要求或条件的最大的尺寸
max_shape = np.array([im.shape for im in ims]).max(axis=0)
num_images = len(ims)
# 初始化一个空的4维矩阵 shape:(N H W 3)
blob = np.zeros((num_images, max_shape[0], max_shape[1], 3),
dtype=np.float32)
for i in xrange(num_images):
im = ims[i]
# 把实际的图片数据赋值给空的4维矩阵
blob[i, 0:im.shape[0], 0:im.shape[1], :] = im
# Move channels (axis 3) to axis 1
# Axis order will become: (batch elem, channel, height, width)
channel_swap = (0, 3, 1, 2)
# 将通道数放前面来
blob = blob.transpose(channel_swap)
return blob
def vis_detections(im, class_name, dets, thresh=0.5):
"""Draw detected bounding boxes."""
inds = np.where(dets[:, -1] >= thresh)[0]
if len(inds) == 0:
return
im = im[:, :, (2, 1, 0)]
fig, ax = plt.subplots(figsize=(12, 12))
ax.imshow(im, aspect='equal')
for i in inds:
bbox = dets[i, :4]
score = dets[i, -1]
ax.add_patch(
plt.Rectangle((bbox[0], bbox[1]),
bbox[2] - bbox[0],
bbox[3] - bbox[1], fill=False,
edgecolor='red', linewidth=3.5)
)
ax.text(bbox[0], bbox[1] - 2,
'{:s} {:.3f}'.format(class_name, score),
bbox=dict(facecolor='blue', alpha=0.5),
fontsize=14, color='white')
ax.set_title(('{} detections with '
'p({} | box) >= {:.1f}').format(class_name, class_name,
thresh),
fontsize=14)
plt.axis('off')
plt.tight_layout()
plt.draw()
def _get_image_blob(im):
"""Converts an image into a network input.
Arguments:
im (ndarray): a color image in BGR order
Returns:
blob (ndarray): a data blob holding an image pyramid
im_scale_factors (list): list of image scales (relative to im) used
in the image pyramid
"""
im_orig = im.astype(np.float32, copy=True)
# 图片减掉均值操作
im_orig -= cfg.PIXEL_MEANS
# 原始图片的shape
im_shape = im_orig.shape
im_size_min = np.min(im_shape[0:2])
im_size_max = np.max(im_shape[0:2])
processed_ims = []
im_scale_factors = []
# cfg.TEST.SCALES = (600,)
for target_size in cfg.TEST.SCALES:
im_scale = float(target_size) / float(im_size_min)
# Prevent the biggest axis from being more than MAX_SIZE
# cfg.TEST.MAX_SIZE = 1000
if np.round(im_scale * im_size_max) > cfg.TEST.MAX_SIZE:
# 为了保证图片缩放后最大的尺寸(不论宽高)不能比预先配置的最大尺寸还要大
im_scale = float(cfg.TEST.MAX_SIZE) / float(im_size_max)
# 到此就得到满足条件的 图片缩放比例im_scale
im = cv2.resize(im_orig, None, None, fx=im_scale, fy=im_scale,
interpolation=cv2.INTER_LINEAR)
im_scale_factors.append(im_scale)
processed_ims.append(im)
# Create a blob to hold the input images
blob = im_list_to_blob(processed_ims)
return blob, np.array(im_scale_factors)
def _project_im_rois(im_rois, scales):
"""Project image RoIs into the image pyramid built by _get_image_blob.
Arguments:
im_rois (ndarray): R x 4 matrix of RoIs in original image coordinates
scales (list): scale factors as returned by _get_image_blob
Returns:
rois (ndarray): R x 4 matrix of projected RoI coordinates
levels (list): image pyramid levels used by each projected RoI
"""
im_rois = im_rois.astype(np.float, copy=False)
if len(scales) > 1:
widths = im_rois[:, 2] - im_rois[:, 0] + 1
heights = im_rois[:, 3] - im_rois[:, 1] + 1
areas = widths * heights
scaled_areas = areas[:, np.newaxis] * (scales[np.newaxis, :] ** 2)
diff_areas = np.abs(scaled_areas - 224 * 224)
levels = diff_areas.argmin(axis=1)[:, np.newaxis]
else:
levels = np.zeros((im_rois.shape[0], 1), dtype=np.int)
rois = im_rois * scales[levels]
return rois, levels
def _get_rois_blob(im_rois, im_scale_factors):
"""Converts RoIs into network inputs.
Arguments:
im_rois (ndarray): R x 4 matrix of RoIs in original image coordinates
im_scale_factors (list): scale factors as returned by _get_image_blob
Returns:
blob (ndarray): R x 5 matrix of RoIs in the image pyramid
"""
rois, levels = _project_im_rois(im_rois, im_scale_factors)
rois_blob = np.hstack((levels, rois))
return rois_blob.astype(np.float32, copy=False)
def _get_blobs(im, rois):
"""Convert an image and RoIs within that image into network inputs."""
blobs = {'data': None, 'rois': None}
blobs['data'], im_scale_factors = _get_image_blob(im)
if not cfg.TEST.HAS_RPN:
print >> dfy_log_file_writer, '没有RPN网络的情况 rois不为None'
blobs['rois'] = _get_rois_blob(rois, im_scale_factors)
print >> dfy_log_file_writer, 'blobs:', blobs.keys(), blobs['data'].shape, im_scale_factors
return blobs, im_scale_factors
def im_detect(net, im, boxes=None):
"""Detect object classes in an image given object proposals.
Arguments:
net (caffe.Net): Fast R-CNN network to use
im (ndarray): color image to test (in BGR order)
boxes (ndarray): R x 4 array of object proposals or None (for RPN)
Returns:
scores (ndarray): R x K array of object class scores (K includes
background as object category 0)
boxes (ndarray): R x (4*K) array of predicted bounding boxes
"""
# 将原始图片进行缩放 宽高同比例缩放
blobs, im_scales = _get_blobs(im, boxes)
# When mapping from image ROIs to feature map ROIs, there's some aliasing
# (some distinct image ROIs get mapped to the same feature ROI).
# Here, we identify duplicate feature ROIs, so we only compute features
# on the unique subset.
if cfg.TEST.HAS_RPN:
im_blob = blobs['data']
blobs['im_info'] = np.array(
[[im_blob.shape[2], im_blob.shape[3], im_scales[0]]],
dtype=np.float32)
print >> dfy_log_file_writer, '有RPN网络:', blobs.keys(), blobs['im_info']
# reshape network inputs
print >> dfy_log_file_writer, 'reshape操作:', blobs['data'].shape
net.blobs['data'].reshape(*(blobs['data'].shape))
if cfg.TEST.HAS_RPN:
print >> dfy_log_file_writer, 'reshape操作:', blobs['im_info'].shape
net.blobs['im_info'].reshape(*(blobs['im_info'].shape))
else:
net.blobs['rois'].reshape(*(blobs['rois'].shape))
# do forward 开始前向计算
forward_kwargs = {'data': blobs['data'].astype(np.float32, copy=False)}
if cfg.TEST.HAS_RPN:
forward_kwargs['im_info'] = blobs['im_info'].astype(np.float32, copy=False)
else:
forward_kwargs['rois'] = blobs['rois'].astype(np.float32, copy=False)
# 前向计算
blobs_out = net.forward(**forward_kwargs)
# https://blog.csdn.net/tina_ttl/article/details/51033660 caffe中如何可视化cnn各层的输出
for layer_name, blob in net.blobs.iteritems():
print >> dfy_log_file_writer, '层名+shape:' + layer_name + '\t' + str(blob.data.shape)
for layer_name, param in net.params.iteritems():
print >> dfy_log_file_writer, '层名+网络W与b:' + layer_name + '\t' + str(param[0].data.shape), str(param[1].data.shape)
print >> dfy_log_file_writer, '前向计算结果blobs_out:\n', blobs_out.keys(), \
blobs_out['bbox_pred'].shape, \
blobs_out['cls_prob'].shape
if cfg.TEST.HAS_RPN:
assert len(im_scales) == 1, "Only single-image batch implemented"
rois = net.blobs['rois'].data.copy()
print >> dfy_log_file_writer, '\nrois:', rois, rois.shape
# unscale back to raw image space
boxes = rois[:, 1:5] / im_scales[0]
print >> dfy_log_file_writer, '\nboxes:', boxes, boxes.shape
# use softmax estimated probabilities
scores = blobs_out['cls_prob']
if cfg.TEST.BBOX_REG:
# True
# Apply bounding-box regression deltas
box_deltas = blobs_out['bbox_pred']
pred_boxes = bbox_transform_inv(boxes, box_deltas)
pred_boxes = clip_boxes(pred_boxes, im.shape)
return scores, pred_boxes
def demo(net, image_name):
"""Detect object classes in an image using pre-computed object proposals."""
# Load the demo image
im_file = os.path.join(cfg.DATA_DIR, 'demo', image_name)
print >> dfy_log_file_writer, 'start检测图片路径:', im_file
im = cv2.imread(im_file)
print >> dfy_log_file_writer, '图片原始大小HWC:', im.shape
# Detect all object classes and regress object bounds
timer = Timer()
timer.tic()
scores, boxes = im_detect(net, im)
timer.toc()
# ('scores.shape:', (300, 21))
print('scores.shape:', scores.shape)
# ('boxes.shape:', (300, 84))
print('boxes.shape:', boxes.shape)
print ('Detection took {:.3f}s for '
'{:d} object proposals').format(timer.total_time, boxes.shape[0])
# Visualize detections for each class
# 置信度阈值
CONF_THRESH = 0.95
# nms阈值
NMS_THRESH = 0.6
for cls_ind, cls in enumerate(CLASSES[1:]):
cls_ind += 1 # because we skipped background
cls_boxes = boxes[:, 4 * cls_ind:4 * (cls_ind + 1)]
cls_scores = scores[:, cls_ind]
dets = np.hstack((cls_boxes,
cls_scores[:, np.newaxis])).astype(np.float32)
# 先使用nms的阈值
keep = nms(dets, NMS_THRESH)
dets = dets[keep, :]
# 可视化的时候再使用 置信度阈值
vis_detections(im, cls, dets, thresh=CONF_THRESH)
if __name__ == '__main__':
cfg.TEST.HAS_RPN = True # Use RPN for proposals
# 相对应deploy.prototxt文件 caffemodel deploy.prototxt
prototxt = 'models/pascal_voc/VGG16/faster_rcnn_alt_opt/faster_rcnn_test.pt'
caffemodel = 'data/faster_rcnn_models/VGG16_faster_rcnn_final.caffemodel'
caffe.set_device(0)
caffe.set_mode_gpu()
# dfy log打印结果文件
# https://blog.csdn.net/jiongnima/article/details/80016683
# 详细的Faster R-CNN源码解析之ROI-Pooling逐行代码解析
dfy_log_file = "tools/demo_detector_dfy.log"
dfy_log_file_writer = open(dfy_log_file, 'w')
net = caffe.Net(prototxt, caffemodel, caffe.TEST)
# im_names = ['001763.jpg', '004545.jpg']
im_names = ['test01.png']
for im_name in im_names:
print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'
print 'Demo for data/demo/{}'.format(im_name)
demo(net, im_name)
plt.show()
dfy_log_file_writer.close()
复制代码
faster rcnn
网络结构详解
:
http://www.ai111.vip/thread-800-1-1.html
使用
Faster RCNN算法训练Kitti数据集
:
1、修改train网络与test网络的结构文件
之前是用在Pascal voc数据集上(共21类) 现在需要用到kitti数据集(共8类)
models/pascal_voc/VGG_CNN_M_1024/faster_rcnn_end2end/
train.prototxt
修改1:name: 'input-data'层 type: 'Python' num_classes=8
修改2:name: 'roi-data'层 type: 'Python' num_classes=8
修改3:name: "cls_score"层 type: "InnerProduct" num_output: 8
修改4:name: "bbox_pred"层 type: "InnerProduct" num_output: 32=8*4
models/pascal_voc/VGG_CNN_M_1024/faster_rcnn_end2end/
test.prototxt
修改1:name: "cls_score"层 type: "InnerProduct" num_output: 8
修改2:name: "bbox_pred"层 type: "InnerProduct" num_output: 32=8*4
2、修改lib/datasets/pascal_voc.py中的类别信息
# 由 21类 改为 8 类 全部都是小写 注意
self._classes = ('__background__', # always index 0
'person_sitting', 'truck', 'van', 'pedestrian',
'cyclist', 'tram', 'car')
3、修改lib/datasets/pascal_voc.py中的数据路径
self._data_path = '/home/dfy888/DataSets/Kitti_voc'
self._image_ext = '.png'
并注释掉下面的代码
# assert os.path.exists(self._devkit_path), \
# 'VOCdevkit path does not exist: {}'.format(self._devkit_path)
搜索_devkit_path替换为_data_path的相应路径即可
修改里面的一个函数_load_pascal_annotation(x1 y1 不能减掉1)
# Make pixel indexes 0-based
x1 = float(bbox.find('xmin').text)
y1 = float(bbox.find('ymin').text)
x2 = float(bbox.find('xmax').text) - 1
y2 = float(bbox.find('ymax').text) - 1
4、去掉预训练模型
修改tools/train_net.py中
train_net(args.solver, roidb, output_dir,
pretrained_model=None,
max_iters=args.max_iters)
5、
遇到的bug与解决方案
:
http://www.ai111.vip/thread-790-1-1.html
http://www.ai111.vip/thread-789-1-1.html
http://www.ai111.vip/thread-788-1-1.html
http://www.ai111.vip/thread-791-1-1.html
http://www.ai111.vip/thread-792-1-1.html
6、开始训练
python tools/train_net.py --gpu 0
7、重新训练时需要删除data/cache
8、开始测试python tools/test_net.py --gpu 0
模型优化思路(实际工程项目中不会去修改整体框架)
:
1、增加训练次数 比如50万次 确保网络收敛
2、修改主干网络CNN 在train.prototxt(重点)
3、修改在train.prototxt中定义的Python层
一般都在lib目录下 比如:roi_data_layer nms等
4、优化输入的数据(重点)
5、精调网络的超参 lib/fast_rcnn/config.py 很多阈值等
作者:
xsoft
时间:
2020-2-3 15:49
谢谢老师提供的资料。
欢迎光临 东方耀AI技术分享 (http://www.ai111.vip/)
Powered by Discuz! X3.4