人脸检测业务项目实战（基于SSD算法改进）

东方耀 · 发表于 2019-9-4 15:55:55

人脸检测业务场景综述
判断是否存在人脸,如果存在人脸则定位到人脸的位置
标准的目标检测问题(针对人脸目标)
1、姿态和表情的变化
2、不同人的外观差异（是否戴眼镜，是否戴口罩疫情严重）
3、光照,遮挡的影响
4、不同视角
5、不同大小、位置

人脸标注方法：矩形标注

人脸检测公开数据集非常多：FDDB LFW WiderFace MegaFace 等

选择数据集资源：WIDER FACE(难度是比较大的)
1、香港中文大学 Yang Shuo, Luo Ping,Loy, Chen Change,Tang Xiaoou收集
2、包含32203个图像和393703个人脸图像
3、在尺度、姿势、装扮、光照等方面表现出了大的变化
4、基于61个事件类别（场景）组织的,对于每一个事件类别,选取其中的40%作为训练集,10%用于交叉验证50%作为测试集
5、下载链接:http://shuoyang1213.me/WIDERFACE/

人脸数据采集可能遇到的问题：
1、不同性别分布,男性、女性。
2、不同年龄分布,儿童、少年、中年、老年。
3、不同人种分布,黑人、白人、黄种人。
4、不同脸型分布,人脸、猪脸、猴脸。
5、人脸没有正对摄像头,角度有倾斜,左右倾斜、上下倾斜。
6、翻拍的人脸照片,清晰照片、不清晰照片。
7、摄像头内包含单张人脸、多张人脸。
8、测试所处的环境:光线正常、过亮、过暗、暖光、冷光、白平衡等
9、不同场景:室内、室外、车站、超市等

人脸验证时,可能会存在的非法行为：
1、长相相似度很高的非本人的照片
2、双胞胎照片
3、整容过的照片
4、软件合成的虚拟人脸
5、基于证件照PS的照片

项目开发环境：
OS：Ubuntu18.04
conda 4.7.10虚拟环境
python：2.7.15
caffe-ssd源码
IDE

ycharm

WiderFace数据转换为VOC格式的数据脚本：

# -*- coding: utf-8 -*-
__author__ = u'东方耀微信：dfy_88888'
__date__ = '2019/7/15 下午3:23'
__product__ = 'PyCharm'
__filename__ = 'widerface2voc'
# import os, cv2, sys, shutil
import cv2
import shutil
from xml.dom.minidom import Document
root_dir = '/home/dfy888/DataSets/WIDER Face DataSet'
root_dir_voc = '/home/dfy888/DataSets/widerface_voc'
def writexml(filename, saveimg, bboxes, xmlpath):
"""
写成voc格式通用的xml文件
:param filename: 图片的路径
:param saveimg: 图片对象 cv2
:param bboxes: 多个人脸框集合
:param xmlpath: xml文件路径
:return:
"""
doc = Document()
# 根节点
annotation = doc.createElement('annotation')
doc.appendChild(annotation)
folder = doc.createElement('folder')
# 注意：widerface_voc voc格式数据的文件夹名字
folder_name = doc.createTextNode('widerface_voc')
folder.appendChild(folder_name)
annotation.appendChild(folder)
filenamenode = doc.createElement('filename')
filename_name = doc.createTextNode(filename)
filenamenode.appendChild(filename_name)
annotation.appendChild(filenamenode)
source = doc.createElement('source')
annotation.appendChild(source)
database = doc.createElement('database')
database.appendChild(doc.createTextNode('wider face Database'))
source.appendChild(database)
annotation_s = doc.createElement('annotation')
annotation_s.appendChild(doc.createTextNode('PASCAL VOC2007'))
source.appendChild(annotation_s)
image = doc.createElement('image')
image.appendChild(doc.createTextNode('flickr'))
source.appendChild(image)
flickrid = doc.createElement('flickrid')
flickrid.appendChild(doc.createTextNode('-1'))
source.appendChild(flickrid)
owner = doc.createElement('owner')
annotation.appendChild(owner)
flickrid_o = doc.createElement('flickrid')
flickrid_o.appendChild(doc.createTextNode('dfy_88888'))
owner.appendChild(flickrid_o)
name_o = doc.createElement('name')
name_o.appendChild(doc.createTextNode('dfy_88888'))
owner.appendChild(name_o)
size = doc.createElement('size')
annotation.appendChild(size)
width = doc.createElement('width')
width.appendChild(doc.createTextNode(str(saveimg.shape[1])))
height = doc.createElement('height')
height.appendChild(doc.createTextNode(str(saveimg.shape[0])))
depth = doc.createElement('depth')
depth.appendChild(doc.createTextNode(str(saveimg.shape[2])))
size.appendChild(width)
size.appendChild(height)
size.appendChild(depth)
segmented = doc.createElement('segmented')
segmented.appendChild(doc.createTextNode('0'))
annotation.appendChild(segmented)
for i in range(len(bboxes)):
# bbox 四维向量： [左上角坐标x y 宽高 w h]
bbox = bboxes[i]
objects = doc.createElement('object')
annotation.appendChild(objects)
object_name = doc.createElement('name')
# 只有人脸
object_name.appendChild(doc.createTextNode('face'))
objects.appendChild(object_name)
pose = doc.createElement('pose')
pose.appendChild(doc.createTextNode('Unspecified'))
objects.appendChild(pose)
truncated = doc.createElement('truncated')
truncated.appendChild(doc.createTextNode('1'))
objects.appendChild(truncated)
difficult = doc.createElement('difficult')
difficult.appendChild(doc.createTextNode('0'))
objects.appendChild(difficult)
bndbox = doc.createElement('bndbox')
objects.appendChild(bndbox)
# xmin ymin 就是标记框左上角的坐标
xmin = doc.createElement('xmin')
xmin.appendChild(doc.createTextNode(str(bbox[0])))
bndbox.appendChild(xmin)
ymin = doc.createElement('ymin')
ymin.appendChild(doc.createTextNode(str(bbox[1])))
bndbox.appendChild(ymin)
# xmax ymax 就是标记框右下角的坐标
xmax = doc.createElement('xmax')
xmax.appendChild(doc.createTextNode(str(bbox[0] + bbox[2])))
bndbox.appendChild(xmax)
ymax = doc.createElement('ymax')
ymax.appendChild(doc.createTextNode(str(bbox[1] + bbox[3])))
bndbox.appendChild(ymax)
with open(xmlpath, 'w') as f:
f.write(doc.toprettyxml(indent=''))
def convert_imgset(img_set_type):
"""
转换数据集（WiderFace---> VOC）
:param img_set_type: train or val
:return:
"""
# 对应数据集中原始图片的路径
img_dir = root_dir + '/WIDER_' + img_set_type + '/images'
# ground truth 的路径（标注文件中）
gt_filepath = root_dir + '/wider_face_split/wider_face_' + img_set_type + '_bbx_gt.txt'
fwrite = open(root_dir_voc + '/ImageSets/Main/' + img_set_type + '.txt', 'w')
print(img_dir)
print(gt_filepath)
# 表示我们解析到了第几张图片
index = 0
no_face_index = []
with open(gt_filepath, 'r') as gt_files:
# 为了快速只取1000个图片样本实际可以是True
while(index < 5):
# 为什么是[: -1]？去掉最后的空格
filename = gt_files.readline().strip()
# print('读取的filename:%s，其长度为：%d' % (filename, len(filename)))
if filename == '' or filename is None:
break
# 图片的绝对路径
img_path = img_dir + '/' + filename
print('读取的图片绝对路径：', img_path)
img = cv2.imread(img_path)
# 可视化看看图片
# cv2.imshow('1', img)
# cv2.waitKey(0)
if not img.data:
break
num_bbox = int(gt_files.readline())
if num_bbox == 0:
# 还是需要读一下
line = gt_files.readline()
no_face_index.append(index)
print('没有人脸框的特殊情况：', line)
bboxes = []
for i in range(num_bbox):
# 每读取一行就是一个人脸框 gt
line = gt_files.readline()
lines = line.split()
# 前面4个值
lines = lines[0: 4]
# bbox 四维向量： [左上角坐标x y 宽高 w h]
bbox = (int(lines[0]), int(lines[1]), int(lines[2]), int(lines[3]))
# 可视化看看人脸框的矩形
cv2.rectangle(img, (int(lines[0]), int(lines[1])),
(int(lines[0]) + int(lines[2]), int(lines[1]) + int(lines[3])),
color=(0, 0, 255), thickness=1)
bboxes.append(bbox)
cv2.imshow(str(index), img)
cv2.waitKey(0)
filename = filename.replace('/', '_')
print('保存后的filename：', filename)
if len(bboxes) == 0:
print('no face box')
index += 1
continue
cv2.imwrite('{}/JPEGImages/{}'.format(root_dir_voc, filename), img)
fwrite.write(filename.split('.')[0] + '\n')
xmlpath = '{}/Annotations/{}.xml'.format(root_dir_voc, filename.split('.')[0])
writexml(filename, img, bboxes, xmlpath)
print('success number is %d' % index)
index += 1
# 循环结束后
print('所有没有人脸的索引：', no_face_index)
fwrite.close()
if __name__ == '__main__':
# num of train images :12879 所有没有人脸的索引： [279, 3808, 7512, 9227]
# convert_imgset('train')
# num of val images : 3225 所有没有人脸的索引： []
convert_imgset('val')
# 修改文件名原本是 train.txt val.txt
# shutil.move(root_dir_voc + '/ImageSets/Main/' + 'train.txt', root_dir_voc + '/ImageSets/Main/' + 'trainval.txt')
# shutil.move(root_dir_voc + '/ImageSets/Main/' + 'val.txt', root_dir_voc + '/ImageSets/Main/' + 'test.txt')

复制代码

之后利用caffe-ssd源码下data目录下的脚本（create_list.sh create_data.sh）将voc格式转换为LMDB格式数据集：
针对人脸数据集对训练脚本的修改：
在caffe-ssd源码下examples/ssd/ssd_pascal.py基础上修改复制创建一个新文件ssd_face_dfy.py
我修改了一些配置参数，更多优化的地方大家可以尝试：
1、数据集的修改：
train_data = "/home/dfy888/DataSets/DataSets_LMDB/WiderFace/lmdb/WiderFace_trainval_lmdb"
test_data = "/home/dfy888/DataSets/DataSets_LMDB/WiderFace/lmdb/WiderFace_test_lmdb"
2、类别数修改（人脸的label_map_file文件 0为背景 1为人脸）
num_classes = 2
3、GPUs的修改

4、solver_param网络超参配置的修改

solver_param = {
# Train parameters
'base_lr': base_lr,
'weight_decay': 0.0005,
'lr_policy': "multistep",
'stepvalue': [4000, 10000, 40000],
'gamma': 0.1,
'momentum': 0.9,
'iter_size': iter_size,
'max_iter': 80000,
'snapshot': 500,
'display': 10,
'average_loss': 10,
# types: AdaDelta, AdaGrad, Adam, Nesterov, RMSProp, SGD)
'type': "Adam",
'solver_mode': solver_mode,
'device_id': device_id,
'debug_info': False,
'snapshot_after_train': True,
# Test parameters
'test_iter': [test_iter],
'test_interval': 500,
'eval_type': "detection",
'ap_version': "11point",
'test_initialization': False,
}

复制代码

5、det_out_param检测层输出参数修改

6、主干网络的修改
路径在caffe-ssd源码下python/caffe/model_libs.py里的某个函数实现
VGGNetBody_Dfy(net, from_layer='data', nopool=False, dilate_pool4=False)

def VGGNetBody_Dfy(net, from_layer, nopool=False, freeze_layers=[], dilate_pool4=False):
print '来自哪个层from_layer:' + from_layer
kwargs = {
'param': [dict(lr_mult=1, decay_mult=1), dict(lr_mult=2, decay_mult=0)],
'weight_filler': dict(type='xavier'),
'bias_filler': dict(type='constant', value=0)}
assert from_layer in net.keys()
net.conv1_1 = L.Convolution(net[from_layer], num_output=32, pad=1, kernel_size=3, **kwargs)
print type(net.conv1_1)
net.relu1_1 = L.ReLU(net.conv1_1, in_place=True)
net.conv1_2 = L.Convolution(net.relu1_1, num_output=32, pad=1, kernel_size=3, **kwargs)
net.relu1_2 = L.ReLU(net.conv1_2, in_place=True)
if nopool:
name = 'conv1_3'
net[name] = L.Convolution(net.relu1_2, num_output=32, pad=1, kernel_size=3, stride=2, **kwargs)
else:
name = 'pool1'
net.pool1 = L.Pooling(net.relu1_2, pool=P.Pooling.MAX, kernel_size=2, stride=2)
net.conv2_1 = L.Convolution(net[name], num_output=64, pad=1, kernel_size=3, **kwargs)
net.relu2_1 = L.ReLU(net.conv2_1, in_place=True)
net.conv2_2 = L.Convolution(net.relu2_1, num_output=64, pad=1, kernel_size=3, **kwargs)
net.relu2_2 = L.ReLU(net.conv2_2, in_place=True)
if nopool:
name = 'conv2_3'
net[name] = L.Convolution(net.relu2_2, num_output=64, pad=1, kernel_size=3, stride=2, **kwargs)
else:
name = 'pool2'
net[name] = L.Pooling(net.relu2_2, pool=P.Pooling.MAX, kernel_size=2, stride=2)
net.conv3_1 = L.Convolution(net[name], num_output=128, pad=1, kernel_size=3, **kwargs)
net.relu3_1 = L.ReLU(net.conv3_1, in_place=True)
# net.conv3_2 = L.Convolution(net.relu3_1, num_output=128, pad=1, kernel_size=3, **kwargs)
# net.relu3_2 = L.ReLU(net.conv3_2, in_place=True)
net.conv3_3 = L.Convolution(net.relu3_1, num_output=128, pad=1, kernel_size=3, **kwargs)
net.relu3_3 = L.ReLU(net.conv3_3, in_place=True)
if nopool:
name = 'conv3_4'
net[name] = L.Convolution(net.relu3_3, num_output=128, pad=1, kernel_size=3, stride=2, **kwargs)
else:
name = 'pool3'
net[name] = L.Pooling(net.relu3_3, pool=P.Pooling.MAX, kernel_size=2, stride=2)
net.conv4_1 = L.Convolution(net[name], num_output=256, pad=1, kernel_size=3, **kwargs)
net.relu4_1 = L.ReLU(net.conv4_1, in_place=True)
# net.conv4_2 = L.Convolution(net.relu4_1, num_output=256, pad=1, kernel_size=3, **kwargs)
# net.relu4_2 = L.ReLU(net.conv4_2, in_place=True)
net.conv4_3 = L.Convolution(net.relu4_1, num_output=256, pad=1, kernel_size=3, **kwargs)
net.relu4_3 = L.ReLU(net.conv4_3, in_place=True)
if nopool:
name = 'conv4_4'
net[name] = L.Convolution(net.relu4_3, num_output=256, pad=1, kernel_size=3, stride=2, **kwargs)
else:
name = 'pool4'
if dilate_pool4:
# 这种最大池化会有信息的重叠会有卷积的膨胀操作
net[name] = L.Pooling(net.relu4_3, pool=P.Pooling.MAX, kernel_size=3, stride=1, pad=1)
dilation = 2
else:
# 这种最大池化不会重叠直接降采样一半
net[name] = L.Pooling(net.relu4_3, pool=P.Pooling.MAX, kernel_size=2, stride=2)
dilation = 1
kernel_size = 3
# 计算需要pad的值
pad = int((kernel_size + (dilation - 1) * (kernel_size - 1)) - 1) / 2
# 卷积的参数dilation是什么意思？lr_mult decay_mult
net.conv5_1 = L.Convolution(net[name], num_output=256, pad=pad, kernel_size=kernel_size, dilation=dilation,
**kwargs)
net.relu5_1 = L.ReLU(net.conv5_1, in_place=True)
# net.conv5_2 = L.Convolution(net.relu5_1, num_output=256, pad=pad, kernel_size=kernel_size, dilation=dilation,
# **kwargs)
# net.relu5_2 = L.ReLU(net.conv5_2, in_place=True)
net.conv5_3 = L.Convolution(net.relu5_1, num_output=256, pad=pad, kernel_size=kernel_size, dilation=dilation,
**kwargs)
net.relu5_3 = L.ReLU(net.conv5_3, in_place=True)
# if need_fc:
# Update freeze layers.
# 卷积的这两个参数lr_mult decay_mult决定：是否冻结该层
kwargs['param'] = [dict(lr_mult=0, decay_mult=0), dict(lr_mult=0, decay_mult=0)]
layers = net.keys()
for freeze_layer in freeze_layers:
if freeze_layer in layers:
net.update(freeze_layer, kwargs)
return net

复制代码

7、prior box层输入的修改：
6种尺寸的feature_map(38*38 19*19 10*10 5*5 3*3 1*1)
mbox_source_layers = ['conv4_3', 'conv5_3', 'conv6_2', 'conv7_2', 'conv8_2', 'conv9_2']
steps = [8, 16, 32, 64, 100, 300] 则是相对于原图300*300的下采样的倍数
aspect_ratios = [[2], [2, 3], [2, 3], [2, 3], [2], [2]] 长宽比
min_sizes max_sizes
normalizations = [20, -1, -1, -1, -1, -1] 只对conv4_3进行正则化因为尺寸最大其他都不进行（-1）
num_priors_per_location(每个anchor为中心提取的priors box个数):[4, 6, 6, 6, 4, 4]
则总共会生成(38*38*4+19*19*6+10*10*6+5*5*6+3*3*4+1*1*4=8732)个pirors box

执行：python examples/ssd/ssd_face_dfy.py 开始模型的训练

使用已经训练好的模型进行预测：
创建文件ssd_face_dfy_predict.py 暂不分享有问题联系东方耀微信：dfy_88888
执行python examples/ssd/ssd_face_dfy_results/ssd_face_dfy_predict.py 查看能否检测到所有人脸及其置信度

结论：随着训练次数的增加模型检测人脸越来越精准置信度也越来越高
使用同一张我在地铁中拍摄的图片进行预测（置信度阈值为0.3）
1、当使用训练了2677次caffemodel时能检测出2个人脸他们的置信度分别为：[0.45893413, 0.3917135 ]
2、当使用训练了5405次caffemodel时能检测出2个人脸他们的置信度分别为：[0.51490724, 0.40424186]
3、当使用训练了6375次caffemodel时能检测出3个人脸他们的置信度分别为：[0.5186607, 0.4068523, 0.3596757]

4、当使用训练了13705次caffemodel时能检测出3个人脸他们的置信度分别为：[0.8050997, 0.4799765, 0.4243071]

5、当使用训练了24357次caffemodel时能检测出3个人脸他们的置信度分别为：[0.84193945, 0.51308644, 0.3932887 ]
6、当使用训练了35557次caffemodel时能检测出3个人脸他们的置信度分别为：[0.90560204, 0.63326204, 0.38992488]
继续训练并预测中。。。

SSD人脸检测模型继续优化的思路：
1、数据打包的时候,可以考虑过滤掉小人脸样本（要求人脸size大于20像素）
2、训练样本数据规模尽可能大,可以尝试获取合并多个不同数据集
3、采用resnet+FPN等特征主干网络，如果是在终端设备中用则考虑轻量级mobilenet等
4、采用更好的LoSS进行训练（原ssd：分类用softmax 回归用smooth L1）
5、等等其他，以后想出来再补充！

xsoft · 发表于 2020-2-3 15:48:11

谢谢老师提供的资料。

		自动登录	找回密码
密码			立即注册

[课堂笔记] 人脸检测业务项目实战（基于SSD算法改进）