先进驾驶辅助系统ADAS业务实战项目总结

东方耀 · 发表于 2019-9-4 20:36:38

ADAS业务场景综述
先进驾驶辅助系统( Advanced Driver Assistance System)
简称ADAS,是利用安装于车上的各式各样的传感器,在第一时
间收集车内外的环境数据,进行静、动态物体的辨识、侦测与追
踪等技术上的处理,从而能够让驾驶者在最快的时间察觉可能发
生的危险,以引起注意和提高安全性的主动安全技术。

ADAS业务的研究方向：
导航与实时交通系统TMC
自适应灯光控制
电子警察系统
行人保护系统
车联网
自动泊车系统
自适应巡航ACC
交通标志识别
盲点探测
车道偏移报警系统
驾驶员疲劳探测
车道保持系统
下坡控制系统
碰撞避免或预碰撞系统
电动汽车报警系统
夜视系统

主要任务：检测车载视频数据中的机动车、非机动车、行人、交通标识符
标准的目标检测问题

判断算法性能好坏：
1、检出率误报率
每一个标记只允许有一个检测与之相对应
重复检测会被视为错误检测
2、AP和mAP

ADAS业务场景的数据集：
Kitti数据集 MOT数据集 Berkeley大规模自动驾驶视频数据集

机动车、非机动车、行人检测问题难点（户外的）：
1、阴天、雨天、夜间目标检测问题
2、拥挤场景下的目标检测问题
3、行人刚性运动带来的检测难题
4、小目标检测问题
5、遮挡问题等等

KiTTi数据集(有图展示)
下载链接：http://www.cvlibs.net/datasets/k ... hp?obj_benchmark=2d
KiTTi数据集由德国卡尔斯鲁厄理工学院和丰田美国技术研究院联合创办,
是目前国际上知名的自动驾驶场景下的计算机视觉算法评测数据集。

该数据集用于评测立体图像,光流,视觉测距,3D物体检测和3D跟踪等计算机视觉技术在车载环境下的性能。
Kitti包含市区、乡村和高速公路等场景采集的真实图像数据,每张图像中最多达15辆车和30个行人,还有各种程度的遮挡与截断。
整个数据集由389对立体图像和光流图,39.2km视觉测距序列以及超过200k3D标注物体的图像组成,以10Hz的频率采样及同步。

查看Kitti数据集中标记文件目标框的所有类别：

# -*- coding: utf-8 -*-
__author__ = u'东方耀微信：dfy_88888'
__date__ = '2019/8/9 下午5:19'
__product__ = 'PyCharm'
__filename__ = 'demo_category'
import glob
list_anno_files = glob.glob('/home/dfy888/DataSets/Kitti_voc/training/label_2/*')
# print list_anno_files
# 7481
print len(list_anno_files)
category_list = []
for file_path in list_anno_files:
with open(file_path) as f:
anno_infos = f.readlines()
# print anno_infos
for anno_item in anno_infos:
category_list.append(anno_item.split(' ')[0])
print '查看Kitti数据集中标记文件目标框的所有类别：'
# ['Cyclist', 'Van', 'Tram', 'Car', 'Misc', 'Pedestrian', 'Truck', 'Person_sitting', 'DontCare']
# 后续打包数据的时候会将Misc杂项 DontCare不关心去掉不会写入voc格式的xml文件中
print set(category_list)
print len(category_list)

复制代码

Kitti数据集转换为VOC格式数据集：其他的标注信息：可以考虑截断和遮挡比如遮挡严重的目标框可以过滤掉保证样本质量更好一些
但是在模型预测的时候会对遮挡严重的目标产生漏检这个需要自己权衡

# -*- coding: utf-8 -*-
__author__ = u'东方耀微信：dfy_88888'
__date__ = '2019/7/15 下午3:23'
__product__ = 'PyCharm'
__filename__ = 'kitti2voc'
import cv2
import glob
from xml.dom.minidom import Document
list_anno_files = glob.glob('/home/dfy888/DataSets/Kitti_voc/training/label_2/*')
def writexml(filename, saveimg, bboxes, xmlpath, typename):
"""
写成voc格式通用的xml文件
:param filename: 图片的路径
:param saveimg: 图片对象 cv2
:param bboxes: 多个人脸框集合
:param xmlpath: xml文件路径
:return:
"""
doc = Document()
# 根节点
annotation = doc.createElement('annotation')
doc.appendChild(annotation)
folder = doc.createElement('folder')
# 注意：widerface_voc voc格式数据的文件夹名字
folder_name = doc.createTextNode('Kitti_voc')
folder.appendChild(folder_name)
annotation.appendChild(folder)
filenamenode = doc.createElement('filename')
filename_name = doc.createTextNode(filename)
filenamenode.appendChild(filename_name)
annotation.appendChild(filenamenode)
source = doc.createElement('source')
annotation.appendChild(source)
database = doc.createElement('database')
database.appendChild(doc.createTextNode('Kitti Database'))
source.appendChild(database)
annotation_s = doc.createElement('annotation')
annotation_s.appendChild(doc.createTextNode('PASCAL VOC2007'))
source.appendChild(annotation_s)
image = doc.createElement('image')
image.appendChild(doc.createTextNode('flickr'))
source.appendChild(image)
flickrid = doc.createElement('flickrid')
flickrid.appendChild(doc.createTextNode('-1'))
source.appendChild(flickrid)
owner = doc.createElement('owner')
annotation.appendChild(owner)
flickrid_o = doc.createElement('flickrid')
flickrid_o.appendChild(doc.createTextNode('dfy_88888'))
owner.appendChild(flickrid_o)
name_o = doc.createElement('name')
name_o.appendChild(doc.createTextNode('dfy_88888'))
owner.appendChild(name_o)
size = doc.createElement('size')
annotation.appendChild(size)
width = doc.createElement('width')
width.appendChild(doc.createTextNode(str(saveimg.shape[1])))
height = doc.createElement('height')
height.appendChild(doc.createTextNode(str(saveimg.shape[0])))
depth = doc.createElement('depth')
depth.appendChild(doc.createTextNode(str(saveimg.shape[2])))
size.appendChild(width)
size.appendChild(height)
size.appendChild(depth)
segmented = doc.createElement('segmented')
segmented.appendChild(doc.createTextNode('0'))
annotation.appendChild(segmented)
for i in range(len(bboxes)):
# bbox 四维向量： [左上角坐标x y 宽高 w h]
bbox = bboxes[i]
objects = doc.createElement('object')
annotation.appendChild(objects)
object_name = doc.createElement('name')
# 不是只有人脸 adas业务场景下行人车辆交通标示
object_name.appendChild(doc.createTextNode(typename[i]))
objects.appendChild(object_name)
pose = doc.createElement('pose')
pose.appendChild(doc.createTextNode('Unspecified'))
objects.appendChild(pose)
truncated = doc.createElement('truncated')
truncated.appendChild(doc.createTextNode('1'))
objects.appendChild(truncated)
difficult = doc.createElement('difficult')
difficult.appendChild(doc.createTextNode('0'))
objects.appendChild(difficult)
bndbox = doc.createElement('bndbox')
objects.appendChild(bndbox)
# xmin ymin 就是标记框左上角的坐标
xmin = doc.createElement('xmin')
xmin.appendChild(doc.createTextNode(str(bbox[0])))
bndbox.appendChild(xmin)
ymin = doc.createElement('ymin')
ymin.appendChild(doc.createTextNode(str(bbox[1])))
bndbox.appendChild(ymin)
# xmax ymax 就是标记框右下角的坐标
xmax = doc.createElement('xmax')
xmax.appendChild(doc.createTextNode(str(bbox[2])))
bndbox.appendChild(xmax)
ymax = doc.createElement('ymax')
ymax.appendChild(doc.createTextNode(str(bbox[3])))
bndbox.appendChild(ymax)
with open(xmlpath, 'w') as f:
f.write(doc.toprettyxml(indent=''))
# 转换数据集（Kitti---> VOC）
trainval = open('/home/dfy888/DataSets/Kitti_voc/ImageSets/Main/trainval.txt', 'w')
train = open('/home/dfy888/DataSets/Kitti_voc/ImageSets/Main/train.txt', 'w')
val = open('/home/dfy888/DataSets/Kitti_voc/ImageSets/Main/val.txt', 'w')
test = open('/home/dfy888/DataSets/Kitti_voc/ImageSets/Main/test.txt', 'w')
index = 0
# 7481
for file_path in list_anno_files:
with open(file_path) as f:
# 每一个标注文件txt格式的
anno_infos = f.readlines()
# print anno_infos
bboxes = []
typename = []
for anno_item in anno_infos:
# 对每一行的信息进行解析
anno_item_infos = anno_item.split()
if anno_item_infos[0] == 'Misc' or anno_item_infos[0] == 'DontCare':
# 将杂项与不关心的过滤掉模型训练更容易一些
continue
else:
typename.append(anno_item_infos[0])
bbox = (int(float(anno_item_infos[4])), int(float(anno_item_infos[5])),
int(float(anno_item_infos[6])), int(float(anno_item_infos[7])))
bboxes.append(bbox)
pass
filename = file_path.split('/')[-1].replace('txt', 'png')
xmlpath = '/home/dfy888/DataSets/Kitti_voc/Annotations/' + filename.replace('png', 'xml')
img_path = '/home/dfy888/DataSets/Kitti_voc/JPEGImages/' + filename
saveimg = cv2.imread(img_path)
writexml(filename, saveimg, bboxes, xmlpath, typename)
# :param img_set_type: trainval or val or test or train
# trainval 90% test 10%
# train 70% val 20%
if index > len(list_anno_files) * 0.9:
test.write(filename.replace('.png', '\n'))
else:
trainval.write(filename.replace('.png', '\n'))
if index > len(list_anno_files) * 0.7:
val.write(filename.replace('.png', '\n'))
else:
train.write(filename.replace('.png', '\n'))
print '正在处理：' + str(index)
index += 1
train.close()
trainval.close()
test.close()
val.close()

复制代码

Faster RCNN检测模型的环境搭建：
https://github.com/rbgirshick/py-faster-rcnn
git clone --recursive https://github.com/rbgirshick/py-faster-rcnn.git
加入--recursive保证项目里面的caffe-fast-rcnn的下载当然这里也有一个坑就是之前的caffe版本过低
具体解决请看：http://www.ai111.vip/thread-788-1-1.html

Faster RCNN检测模型的测试：python tools/demo_detector_dfy.py

# -*- coding: utf-8 -*-
__author__ = u'东方耀微信：dfy_88888'
__date__ = '2019/9/8 上午10:48'
__product__ = 'PyCharm'
__filename__ = 'demo_detector_dfy.py'
"""
Demo script showing detections in sample images.
东方修改的：利用faster_rcnn vgg16的预训练模型进行目标检测demo
"""
import _init_paths
from fast_rcnn.config import cfg
from nms.gpu_nms import gpu_nms
from nms.cpu_nms import cpu_nms
import time
import matplotlib.pyplot as plt
import numpy as np
import caffe
import os
import cv2
# pascal voc 共21类（含背景）
CLASSES = ('__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor')
class Timer(object):
"""A simple timer."""
def __init__(self):
self.total_time = 0.
self.calls = 0
self.start_time = 0.
self.diff = 0.
self.average_time = 0.
def tic(self):
# using time.time instead of time.clock because time time.clock
# does not normalize for multithreading
self.start_time = time.time()
def toc(self, average=True):
self.diff = time.time() - self.start_time
self.total_time += self.diff
self.calls += 1
self.average_time = self.total_time / self.calls
if average:
return self.average_time
else:
return self.diff
def nms(dets, thresh, force_cpu=False):
"""Dispatch to either CPU or GPU NMS implementations."""
if dets.shape[0] == 0:
return []
if cfg.USE_GPU_NMS and not force_cpu:
return gpu_nms(dets, thresh, device_id=cfg.GPU_ID)
else:
return cpu_nms(dets, thresh)
def bbox_transform_inv(boxes, deltas):
if boxes.shape[0] == 0:
return np.zeros((0, deltas.shape[1]), dtype=deltas.dtype)
boxes = boxes.astype(deltas.dtype, copy=False)
widths = boxes[:, 2] - boxes[:, 0] + 1.0
heights = boxes[:, 3] - boxes[:, 1] + 1.0
ctr_x = boxes[:, 0] + 0.5 * widths
ctr_y = boxes[:, 1] + 0.5 * heights
dx = deltas[:, 0::4]
dy = deltas[:, 1::4]
dw = deltas[:, 2::4]
dh = deltas[:, 3::4]
pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis]
pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis]
pred_w = np.exp(dw) * widths[:, np.newaxis]
pred_h = np.exp(dh) * heights[:, np.newaxis]
pred_boxes = np.zeros(deltas.shape, dtype=deltas.dtype)
# x1
pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
# y1
pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
# x2
pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w
# y2
pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h
return pred_boxes
def clip_boxes(boxes, im_shape):
"""
Clip boxes to image boundaries.
"""
# x1 >= 0
boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], im_shape[1] - 1), 0)
# y1 >= 0
boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], im_shape[0] - 1), 0)
# x2 < im_shape[1]
boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], im_shape[1] - 1), 0)
# y2 < im_shape[0]
boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], im_shape[0] - 1), 0)
return boxes
def im_list_to_blob(ims):
"""Convert a list of images into a network input.
ims:一个列表里面都是图片（cv2.resize之后的放进去的）
Assumes images are already prepared (means subtracted, BGR order, ...).
"""
# max_shape ：能满足要求或条件的最大的尺寸
max_shape = np.array([im.shape for im in ims]).max(axis=0)
num_images = len(ims)
# 初始化一个空的4维矩阵 shape:(N H W 3)
blob = np.zeros((num_images, max_shape[0], max_shape[1], 3),
dtype=np.float32)
for i in xrange(num_images):
im = ims[i]
# 把实际的图片数据赋值给空的4维矩阵
blob[i, 0:im.shape[0], 0:im.shape[1], :] = im
# Move channels (axis 3) to axis 1
# Axis order will become: (batch elem, channel, height, width)
channel_swap = (0, 3, 1, 2)
# 将通道数放前面来
blob = blob.transpose(channel_swap)
return blob
def vis_detections(im, class_name, dets, thresh=0.5):
"""Draw detected bounding boxes."""
inds = np.where(dets[:, -1] >= thresh)[0]
if len(inds) == 0:
return
im = im[:, :, (2, 1, 0)]
fig, ax = plt.subplots(figsize=(12, 12))
ax.imshow(im, aspect='equal')
for i in inds:
bbox = dets[i, :4]
score = dets[i, -1]
ax.add_patch(
plt.Rectangle((bbox[0], bbox[1]),
bbox[2] - bbox[0],
bbox[3] - bbox[1], fill=False,
edgecolor='red', linewidth=3.5)
)
ax.text(bbox[0], bbox[1] - 2,
'{:s} {:.3f}'.format(class_name, score),
bbox=dict(facecolor='blue', alpha=0.5),
fontsize=14, color='white')
ax.set_title(('{} detections with '
'p({} | box) >= {:.1f}').format(class_name, class_name,
thresh),
fontsize=14)
plt.axis('off')
plt.tight_layout()
plt.draw()
def _get_image_blob(im):
"""Converts an image into a network input.
Arguments:
im (ndarray): a color image in BGR order
Returns:
blob (ndarray): a data blob holding an image pyramid
im_scale_factors (list): list of image scales (relative to im) used
in the image pyramid
"""
im_orig = im.astype(np.float32, copy=True)
# 图片减掉均值操作
im_orig -= cfg.PIXEL_MEANS
# 原始图片的shape
im_shape = im_orig.shape
im_size_min = np.min(im_shape[0:2])
im_size_max = np.max(im_shape[0:2])
processed_ims = []
im_scale_factors = []
# cfg.TEST.SCALES = (600,)
for target_size in cfg.TEST.SCALES:
im_scale = float(target_size) / float(im_size_min)
# Prevent the biggest axis from being more than MAX_SIZE
# cfg.TEST.MAX_SIZE = 1000
if np.round(im_scale * im_size_max) > cfg.TEST.MAX_SIZE:
# 为了保证图片缩放后最大的尺寸(不论宽高)不能比预先配置的最大尺寸还要大
im_scale = float(cfg.TEST.MAX_SIZE) / float(im_size_max)
# 到此就得到满足条件的图片缩放比例im_scale
im = cv2.resize(im_orig, None, None, fx=im_scale, fy=im_scale,
interpolation=cv2.INTER_LINEAR)
im_scale_factors.append(im_scale)
processed_ims.append(im)
# Create a blob to hold the input images
blob = im_list_to_blob(processed_ims)
return blob, np.array(im_scale_factors)
def _project_im_rois(im_rois, scales):
"""Project image RoIs into the image pyramid built by _get_image_blob.
Arguments:
im_rois (ndarray): R x 4 matrix of RoIs in original image coordinates
scales (list): scale factors as returned by _get_image_blob
Returns:
rois (ndarray): R x 4 matrix of projected RoI coordinates
levels (list): image pyramid levels used by each projected RoI
"""
im_rois = im_rois.astype(np.float, copy=False)
if len(scales) > 1:
widths = im_rois[:, 2] - im_rois[:, 0] + 1
heights = im_rois[:, 3] - im_rois[:, 1] + 1
areas = widths * heights
scaled_areas = areas[:, np.newaxis] * (scales[np.newaxis, :] ** 2)
diff_areas = np.abs(scaled_areas - 224 * 224)
levels = diff_areas.argmin(axis=1)[:, np.newaxis]
else:
levels = np.zeros((im_rois.shape[0], 1), dtype=np.int)
rois = im_rois * scales[levels]
return rois, levels
def _get_rois_blob(im_rois, im_scale_factors):
"""Converts RoIs into network inputs.
Arguments:
im_rois (ndarray): R x 4 matrix of RoIs in original image coordinates
im_scale_factors (list): scale factors as returned by _get_image_blob
Returns:
blob (ndarray): R x 5 matrix of RoIs in the image pyramid
"""
rois, levels = _project_im_rois(im_rois, im_scale_factors)
rois_blob = np.hstack((levels, rois))
return rois_blob.astype(np.float32, copy=False)
def _get_blobs(im, rois):
"""Convert an image and RoIs within that image into network inputs."""
blobs = {'data': None, 'rois': None}
blobs['data'], im_scale_factors = _get_image_blob(im)
if not cfg.TEST.HAS_RPN:
print >> dfy_log_file_writer, '没有RPN网络的情况 rois不为None'
blobs['rois'] = _get_rois_blob(rois, im_scale_factors)
print >> dfy_log_file_writer, 'blobs:', blobs.keys(), blobs['data'].shape, im_scale_factors
return blobs, im_scale_factors
def im_detect(net, im, boxes=None):
"""Detect object classes in an image given object proposals.
Arguments:
net (caffe.Net): Fast R-CNN network to use
im (ndarray): color image to test (in BGR order)
boxes (ndarray): R x 4 array of object proposals or None (for RPN)
Returns:
scores (ndarray): R x K array of object class scores (K includes
background as object category 0)
boxes (ndarray): R x (4*K) array of predicted bounding boxes
"""
# 将原始图片进行缩放宽高同比例缩放
blobs, im_scales = _get_blobs(im, boxes)
# When mapping from image ROIs to feature map ROIs, there's some aliasing
# (some distinct image ROIs get mapped to the same feature ROI).
# Here, we identify duplicate feature ROIs, so we only compute features
# on the unique subset.
if cfg.TEST.HAS_RPN:
im_blob = blobs['data']
blobs['im_info'] = np.array(
[[im_blob.shape[2], im_blob.shape[3], im_scales[0]]],
dtype=np.float32)
print >> dfy_log_file_writer, '有RPN网络：', blobs.keys(), blobs['im_info']
# reshape network inputs
print >> dfy_log_file_writer, 'reshape操作:', blobs['data'].shape
net.blobs['data'].reshape(*(blobs['data'].shape))
if cfg.TEST.HAS_RPN:
print >> dfy_log_file_writer, 'reshape操作:', blobs['im_info'].shape
net.blobs['im_info'].reshape(*(blobs['im_info'].shape))
else:
net.blobs['rois'].reshape(*(blobs['rois'].shape))
# do forward 开始前向计算
forward_kwargs = {'data': blobs['data'].astype(np.float32, copy=False)}
if cfg.TEST.HAS_RPN:
forward_kwargs['im_info'] = blobs['im_info'].astype(np.float32, copy=False)
else:
forward_kwargs['rois'] = blobs['rois'].astype(np.float32, copy=False)
# 前向计算
blobs_out = net.forward(**forward_kwargs)
# https://blog.csdn.net/tina_ttl/article/details/51033660 caffe中如何可视化cnn各层的输出
for layer_name, blob in net.blobs.iteritems():
print >> dfy_log_file_writer, '层名+shape：' + layer_name + '\t' + str(blob.data.shape)
for layer_name, param in net.params.iteritems():
print >> dfy_log_file_writer, '层名+网络W与b：' + layer_name + '\t' + str(param[0].data.shape), str(param[1].data.shape)
print >> dfy_log_file_writer, '前向计算结果blobs_out:\n', blobs_out.keys(), \
blobs_out['bbox_pred'].shape, \
blobs_out['cls_prob'].shape
if cfg.TEST.HAS_RPN:
assert len(im_scales) == 1, "Only single-image batch implemented"
rois = net.blobs['rois'].data.copy()
print >> dfy_log_file_writer, '\nrois:', rois, rois.shape
# unscale back to raw image space
boxes = rois[:, 1:5] / im_scales[0]
print >> dfy_log_file_writer, '\nboxes:', boxes, boxes.shape
# use softmax estimated probabilities
scores = blobs_out['cls_prob']
if cfg.TEST.BBOX_REG:
# True
# Apply bounding-box regression deltas
box_deltas = blobs_out['bbox_pred']
pred_boxes = bbox_transform_inv(boxes, box_deltas)
pred_boxes = clip_boxes(pred_boxes, im.shape)
return scores, pred_boxes
def demo(net, image_name):
"""Detect object classes in an image using pre-computed object proposals."""
# Load the demo image
im_file = os.path.join(cfg.DATA_DIR, 'demo', image_name)
print >> dfy_log_file_writer, 'start检测图片路径：', im_file
im = cv2.imread(im_file)
print >> dfy_log_file_writer, '图片原始大小HWC：', im.shape
# Detect all object classes and regress object bounds
timer = Timer()
timer.tic()
scores, boxes = im_detect(net, im)
timer.toc()
# ('scores.shape:', (300, 21))
print('scores.shape:', scores.shape)
# ('boxes.shape:', (300, 84))
print('boxes.shape:', boxes.shape)
print ('Detection took {:.3f}s for '
'{:d} object proposals').format(timer.total_time, boxes.shape[0])
# Visualize detections for each class
# 置信度阈值
CONF_THRESH = 0.95
# nms阈值
NMS_THRESH = 0.6
for cls_ind, cls in enumerate(CLASSES[1:]):
cls_ind += 1 # because we skipped background
cls_boxes = boxes[:, 4 * cls_ind:4 * (cls_ind + 1)]
cls_scores = scores[:, cls_ind]
dets = np.hstack((cls_boxes,
cls_scores[:, np.newaxis])).astype(np.float32)
# 先使用nms的阈值
keep = nms(dets, NMS_THRESH)
dets = dets[keep, :]
# 可视化的时候再使用置信度阈值
vis_detections(im, cls, dets, thresh=CONF_THRESH)
if __name__ == '__main__':
cfg.TEST.HAS_RPN = True # Use RPN for proposals
# 相对应deploy.prototxt文件 caffemodel deploy.prototxt
prototxt = 'models/pascal_voc/VGG16/faster_rcnn_alt_opt/faster_rcnn_test.pt'
caffemodel = 'data/faster_rcnn_models/VGG16_faster_rcnn_final.caffemodel'
caffe.set_device(0)
caffe.set_mode_gpu()
# dfy log打印结果文件
# https://blog.csdn.net/jiongnima/article/details/80016683
# 详细的Faster R-CNN源码解析之ROI-Pooling逐行代码解析
dfy_log_file = "tools/demo_detector_dfy.log"
dfy_log_file_writer = open(dfy_log_file, 'w')
net = caffe.Net(prototxt, caffemodel, caffe.TEST)
# im_names = ['001763.jpg', '004545.jpg']
im_names = ['test01.png']
for im_name in im_names:
print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'
print 'Demo for data/demo/{}'.format(im_name)
demo(net, im_name)
plt.show()
dfy_log_file_writer.close()

复制代码

faster rcnn网络结构详解：http://www.ai111.vip/thread-800-1-1.html

使用Faster RCNN算法训练Kitti数据集：
1、修改train网络与test网络的结构文件
之前是用在Pascal voc数据集上(共21类) 现在需要用到kitti数据集（共8类）
models/pascal_voc/VGG_CNN_M_1024/faster_rcnn_end2end/train.prototxt
修改1：name: 'input-data'层 type: 'Python'  num_classes=8
修改2：name: 'roi-data'层  type: 'Python'  num_classes=8
修改3：name: "cls_score"层  type: "InnerProduct"  num_output: 8
修改4：name: "bbox_pred"层  type: "InnerProduct"  num_output: 32=8*4
models/pascal_voc/VGG_CNN_M_1024/faster_rcnn_end2end/test.prototxt
修改1：name: "cls_score"层  type: "InnerProduct"  num_output: 8
修改2：name: "bbox_pred"层  type: "InnerProduct"  num_output: 32=8*4

2、修改lib/datasets/pascal_voc.py中的类别信息
      # 由 21类改为 8 类全部都是小写注意
      self._classes = ('__background__',  # always index 0
                     'person_sitting', 'truck', 'van', 'pedestrian',
                     'cyclist', 'tram', 'car')

3、修改lib/datasets/pascal_voc.py中的数据路径
self._data_path = '/home/dfy888/DataSets/Kitti_voc'
self._image_ext = '.png'
并注释掉下面的代码
# assert os.path.exists(self._devkit_path), \
#    'VOCdevkit path does not exist: {}'.format(self._devkit_path)
搜索_devkit_path替换为_data_path的相应路径即可
修改里面的一个函数_load_pascal_annotation（x1 y1 不能减掉1）
         # Make pixel indexes 0-based
         x1 = float(bbox.find('xmin').text)
         y1 = float(bbox.find('ymin').text)
         x2 = float(bbox.find('xmax').text) - 1
         y2 = float(bbox.find('ymax').text) - 1
4、去掉预训练模型
修改tools/train_net.py中
train_net(args.solver, roidb, output_dir,
            pretrained_model=None,
            max_iters=args.max_iters)
5、遇到的bug与解决方案：
http://www.ai111.vip/thread-790-1-1.html
http://www.ai111.vip/thread-789-1-1.html
http://www.ai111.vip/thread-788-1-1.html
http://www.ai111.vip/thread-791-1-1.html
http://www.ai111.vip/thread-792-1-1.html
6、开始训练python tools/train_net.py --gpu 0
7、重新训练时需要删除data/cache
8、开始测试python tools/test_net.py --gpu 0

模型优化思路（实际工程项目中不会去修改整体框架）：
1、增加训练次数比如50万次确保网络收敛
2、修改主干网络CNN 在train.prototxt(重点)
3、修改在train.prototxt中定义的Python层
一般都在lib目录下比如：roi_data_layer nms等
4、优化输入的数据(重点)
5、精调网络的超参 lib/fast_rcnn/config.py 很多阈值等

xsoft · 发表于 2020-2-3 15:49:35

谢谢老师提供的资料。

		自动登录	找回密码
密码			立即注册

[课堂笔记] 先进驾驶辅助系统ADAS业务实战项目总结