东方耀AI技术分享

 找回密码
 立即注册

QQ登录

只需一步,快速开始

搜索
热搜: 活动 交友 discuz
查看: 3508|回复: 0
打印 上一主题 下一主题

[通信原理] python信号处理之特征提取

[复制链接]

1365

主题

1856

帖子

1万

积分

管理员

Rank: 10Rank: 10Rank: 10

积分
14439
QQ
跳转到指定楼层
楼主
发表于 2021-1-27 09:12:39 | 只看该作者 |只看大图 回帖奖励 |倒序浏览 |阅读模式


python信号处理之特征提取
python玩转信号处理与机器学习
通过手机采集的振动信号识别人体的动作。


  1. import numpy as np
  2. import os
  3. import matplotlib.pyplot as plt
  4. from collections import defaultdict, Counter
  5. from detecta import detect_peaks

  6. '''
  7. python信号处理之特征提取
  8. python玩转信号处理与机器学习
  9. 通过手机采集的振动信号识别人体的动作。

  10. 该数据集通过在30个不同年龄分布的志愿者上做实验采集得到,在志愿者的腰上固定一手机,手机以50Hz的采样频率采集内嵌加速度计和陀螺仪的数据,
  11. 志愿者被要求做以下六个动作:walking(行走),walking upstairs(爬楼梯),walking downstairs(下楼梯),
  12. sitting(坐着),standing(站着),laying(躺下)。
  13. 在滤除噪声之后,通过滑动窗口的方式将信号切分成2.56s的窗口,每个窗口包含128个样本。
  14. 该数据集包含三组数据three-axial linear body acceleration(去除重力加速度的加速度计数据)、
  15. three-axial linear total acceleration(包含重力加速度的加速度计数据)、
  16. three-axial angular velocity(陀螺仪的数据),
  17. 因此共有九个分量,其中训练集有7392个窗口,测试集有2947个窗口

  18. pip3 install siml 不需要这个
  19. pip3 install detecta

  20. 那么提取什么特征呢,一种方式是提取脉冲(peak)发生时所在的横纵坐标,我们提取频谱中的前5个脉冲的横纵坐标作为特征
  21. 对于每个变换,我们取前五个峰值的横纵坐标(若少于五个则填充0)
  22. 这270个特性中的一些特征将比其他特征提供更多的信息,若我们能主动地选择对分类很重要的特征进行组合,或者对超参数进行优化,相信准确率还能继续提高。
  23. '''

  24. # The train dataset contains 7352 signals, each one of length 128 and 9 components
  25. # The test dataset contains 2947 signals, each one of length 128 and 9 components
  26. # The train dataset contains 7352 labels, with the following distribution:
  27. #  Counter({6: 1407, 5: 1374, 4: 1286, 1: 1226, 2: 1073, 3: 986})
  28. # The test dataset contains 2947 labels, with the following distribution:
  29. #  Counter({6: 537, 5: 532, 1: 496, 4: 491, 2: 471, 3: 420})


  30. activities_description = {
  31.     1: 'walking',
  32.     2: 'walking upstairs',
  33.     3: 'walking downstairs',
  34.     4: 'sitting',
  35.     5: 'standing',
  36.     6: 'laying'
  37. }


  38. def read_signals(filename):
  39.     with open(filename, 'r') as fp:
  40.         data = fp.read().splitlines()
  41.         data = map(lambda x: x.rstrip().lstrip().split(), data)
  42.         data = [list(map(float, line)) for line in data]
  43.     return data


  44. def read_labels(filename):
  45.     with open(filename, 'r') as fp:
  46.         activities = fp.read().splitlines()
  47.         activities = list(map(int, activities))
  48.     return activities


  49. def randomize(dataset, labels):
  50.     permutation = np.random.permutation(labels.shape[0])
  51.     shuffled_dataset = dataset[permutation, :, :]
  52.     shuffled_labels = labels[permutation]
  53.     return shuffled_dataset, shuffled_labels


  54. INPUT_FOLDER_TRAIN = 'D:\\UCI HAR Dataset\\UCI HAR Dataset\\train\\Inertial Signals\\'
  55. INPUT_FOLDER_TEST = 'D:\\UCI HAR Dataset\\UCI HAR Dataset\\test\\Inertial Signals\\'

  56. INPUT_FILES_TRAIN = ['body_acc_x_train.txt', 'body_acc_y_train.txt', 'body_acc_z_train.txt',
  57.                      'body_gyro_x_train.txt', 'body_gyro_y_train.txt', 'body_gyro_z_train.txt',
  58.                      'total_acc_x_train.txt', 'total_acc_y_train.txt', 'total_acc_z_train.txt']

  59. INPUT_FILES_TEST = ['body_acc_x_test.txt', 'body_acc_y_test.txt', 'body_acc_z_test.txt',
  60.                     'body_gyro_x_test.txt', 'body_gyro_y_test.txt', 'body_gyro_z_test.txt',
  61.                     'total_acc_x_test.txt', 'total_acc_y_test.txt', 'total_acc_z_test.txt']

  62. LABELFILE_TRAIN = 'D:\\UCI HAR Dataset\\UCI HAR Dataset\\train\\y_train.txt'
  63. LABELFILE_TEST = 'D:\\UCI HAR Dataset\\UCI HAR Dataset\\test\\y_test.txt'

  64. train_signals, test_signals = [], []

  65. for input_file in INPUT_FILES_TRAIN:
  66.     # signal = read_signals(INPUT_FOLDER_TRAIN + input_file)
  67.     signal = read_signals(os.path.join(INPUT_FOLDER_TRAIN, input_file))
  68.     # print("读入%s文件:" % input_file, signal[0], len(signal[0]))
  69.     # assert 0 == 1, "停"
  70.     train_signals.append(signal)

  71. # 训练的信号: (9, 7352, 128)  有9个文件
  72. print("训练的信号:", np.array(train_signals).shape)
  73. train_signals = np.transpose(np.array(train_signals), (1, 2, 0))
  74. print("交换通道后的训练的信号:", train_signals.shape)
  75. # assert 0 == 1, "停"

  76. for input_file in INPUT_FILES_TEST:
  77.     signal = read_signals(INPUT_FOLDER_TEST + input_file)
  78.     test_signals.append(signal)
  79. test_signals = np.transpose(np.array(test_signals), (1, 2, 0))
  80. print("交换通道后的test的信号:", test_signals.shape)

  81. train_labels = read_labels(LABELFILE_TRAIN)
  82. test_labels = read_labels(LABELFILE_TEST)
  83. print("训练的标签文件长度:", len(train_labels), train_labels[:10], set(train_labels))
  84. print("test的标签文件长度:", len(test_labels), test_labels[:10], set(test_labels))

  85. # assert 0 == 1, "停"

  86. [no_signals_train, no_steps_train, no_components_train] = np.shape(train_signals)
  87. [no_signals_test, no_steps_test, no_components_test] = np.shape(test_signals)
  88. no_labels = len(np.unique(train_labels))
  89. print("去重之后的标签数:", no_labels)

  90. print("The train dataset contains {} signals, each one of length {} and {} components ".format(no_signals_train,
  91.                                                                                                no_steps_train,
  92.                                                                                                no_components_train))
  93. print("The test dataset contains {} signals, each one of length {} and {} components ".format(no_signals_test,
  94.                                                                                               no_steps_test,
  95.                                                                                               no_components_test))
  96. # 这个计数器:统计每个标签重复的次数
  97. print("The train dataset contains {} labels, with the following distribution:\n {}".format(np.shape(train_labels)[0],
  98.                                                                                            Counter(train_labels)))
  99. print("The test dataset contains {} labels, with the following distribution:\n {}".format(np.shape(test_labels)[0],
  100.                                                                                           Counter(test_labels[:])))
  101. print("随机打乱 训练与测试数据集")
  102. train_signals, train_labels = randomize(train_signals, np.array(train_labels))
  103. test_signals, test_labels = randomize(test_signals, np.array(test_labels))

  104. # 数据可视化
  105. N = 128  # 采样个数
  106. f_s = 50  # 采样频率
  107. denominator = 10  # 分母

  108. # 随机选择一个信号样本(signal为长度128的列表)
  109. signal_no = 15
  110. signals = train_signals[signal_no, :, :]
  111. # 这里选择3是这个文件:前面已经打乱了
  112. signal = signals[:, 3]
  113. label = train_labels[signal_no]
  114. activity_name = activities_description[label]
  115. print("选择的具体某一个信号样本:", signals.shape, signal.shape, activity_name)
  116. print("看看具体这个样本数据:", signal)


  117. def get_values(y_values, N, f_s):
  118.     f_values = np.linspace(0.0, N / f_s, N)
  119.     return f_values, y_values


  120. f_values, signal_values = get_values(signal, N, f_s)
  121. plt.plot(f_values, signal_values, linestyle='-', color='blue')
  122. plt.xlabel('Time [sec]', fontsize=16)
  123. plt.ylabel('Amplitude', fontsize=16)
  124. plt.title("Time domain of the signal", fontsize=16)
  125. # plt.savefig('01_time_domain.png', dpi=200)
  126. plt.show()

  127. from scipy.fftpack import fft


  128. def get_fft_values(y_values, N, f_s):
  129.     f_values = np.linspace(0.0, f_s / 2.0, N // 2)
  130.     fft_values_ = fft(y_values)
  131.     fft_values = 2.0 / N * np.abs(fft_values_[0:N // 2])
  132.     return f_values, fft_values


  133. f_values, fft_values = get_fft_values(signal, N, f_s)

  134. plt.plot(f_values, fft_values, linestyle='-', color='blue')
  135. plt.xlabel('Frequency [Hz]', fontsize=16)
  136. plt.ylabel('Amplitude', fontsize=16)
  137. plt.title("Frequency domain of the signal", fontsize=16)
  138. # plt.savefig('02_fft.png', dpi=200)
  139. plt.show()

  140. from scipy.signal import welch


  141. def get_psd_values(y_values, N, f_s):
  142.     f_values, psd_values = welch(y_values, fs=f_s)
  143.     return f_values, psd_values


  144. f_values, psd_values = get_psd_values(signal, N, f_s)

  145. plt.plot(f_values, psd_values, linestyle='-', color='blue')
  146. plt.xlabel('Frequency [Hz]', fontsize=16)
  147. plt.ylabel('PSD [V**2 / Hz]', fontsize=16)
  148. # plt.savefig('03_psd.png', dpi=200)
  149. plt.show()


  150. def autocorr(x):
  151.     result = np.correlate(x, x, mode='full')
  152.     return result[len(result) // 2:]


  153. def get_autocorr_values(y_values, N, f_s):
  154.     autocorr_values = autocorr(y_values)
  155.     x_values = np.array([1.0 * jj / f_s for jj in range(0, N)])
  156.     return x_values, autocorr_values


  157. t_values, autocorr_values = get_autocorr_values(signal, N, f_s)

  158. plt.plot(t_values, autocorr_values, linestyle='-', color='blue')
  159. plt.xlabel('time delay [s]', fontsize=16)
  160. plt.ylabel('Autocorrelation amplitude', fontsize=16)
  161. # plt.savefig('04_autocorr.png', dpi=200)
  162. plt.show()


  163. labels = ['x-component', 'y-component', 'z-component']
  164. colors = ['r', 'g', 'b']
  165. suptitle = "Different signals for the activity: {}"

  166. xlabels = ['Time [sec]', 'Freq [Hz]', 'Freq [Hz]', 'Time lag [s]']
  167. ylabel = 'Amplitude'
  168. axtitles = [['Acceleration', 'Gyro', 'Total acceleration'],
  169.             ['FFT acc', 'FFT gyro', 'FFT total acc'],
  170.             ['PSD acc', 'PSD gyro', 'PSD total acc'],
  171.             ['Autocorr acc', 'Autocorr gyro', 'Autocorr total acc']
  172.             ]

  173. list_functions = [get_values, get_fft_values, get_psd_values, get_autocorr_values]

  174. f, axarr = plt.subplots(nrows=4, ncols=3, figsize=(12, 12))
  175. f.suptitle(suptitle.format(activity_name), fontsize=16)

  176. for row_no in range(0, 4):
  177.     for comp_no in range(0, 9):
  178.         col_no = comp_no // 3
  179.         plot_no = comp_no % 3
  180.         color = colors[plot_no]
  181.         label = labels[plot_no]

  182.         axtitle = axtitles[row_no][col_no]
  183.         xlabel = xlabels[row_no]
  184.         value_retriever = list_functions[row_no]

  185.         ax = axarr[row_no, col_no]
  186.         ax.set_title(axtitle, fontsize=16)
  187.         ax.set_xlabel(xlabel, fontsize=16)
  188.         if col_no == 0:
  189.             ax.set_ylabel(ylabel, fontsize=16)

  190.         signal_component = signals[:, comp_no]
  191.         x_values, y_values = value_retriever(signal_component, N, f_s)
  192.         ax.plot(x_values, y_values, linestyle='-', color=color, label=label)
  193.         if row_no > 0:
  194.             max_peak_height = 0.1 * np.nanmax(y_values)
  195.             indices_peaks = detect_peaks(y_values, mph=max_peak_height)
  196.             ax.scatter(x_values[indices_peaks], y_values[indices_peaks], c=color, marker='*', s=60)
  197.         if col_no == 2:
  198.             ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
  199. plt.tight_layout()
  200. plt.subplots_adjust(top=0.90, hspace=0.6)
  201. # plt.savefig('05_all.png', dpi=200)
  202. plt.show()


  203. # 特征提取
  204. def get_first_n_peaks(x, y, no_peaks=5):
  205.     x_, y_ = list(x), list(y)
  206.     if len(x_) >= no_peaks:
  207.         return x_[:no_peaks], y_[:no_peaks]
  208.     else:#少于5个peaks,以0填充
  209.         missing_no_peaks = no_peaks-len(x_)
  210.         return x_ + [0]*missing_no_peaks, y_ + [0]*missing_no_peaks


  211. def get_features(x_values, y_values, mph):
  212.     indices_peaks = detect_peaks(y_values, mph=mph)
  213.     peaks_x, peaks_y = get_first_n_peaks(
  214.         x_values[indices_peaks], y_values[indices_peaks])
  215.     return peaks_x + peaks_y


  216. def extract_features_labels(dataset, labels, N, f_s, denominator):
  217.     percentile = 5
  218.     list_of_features = []
  219.     list_of_labels = []
  220.     for signal_no in range(0, len(dataset)):
  221.         # signal_no 信号序号
  222.         features = []
  223.         list_of_labels.append(labels[signal_no])
  224.         for signal_comp in range(0, dataset.shape[2]):
  225.             # signal_component = 9
  226.             signal = dataset[signal_no, :, signal_comp]
  227.             assert len(signal) == 128, "某条波=128"
  228.             # 这是在干啥?
  229.             signal_min = np.nanpercentile(signal, percentile)
  230.             signal_max = np.nanpercentile(signal, 100-percentile)
  231.             #ijk = (100 - 2*percentile)/10
  232.             #set minimum peak height
  233.             mph = signal_min + (signal_max - signal_min)/denominator

  234.             features += get_features(*get_psd_values(signal, N, f_s), mph)
  235.             features += get_features(*get_fft_values(signal, N, f_s), mph)
  236.             features += get_features(*get_autocorr_values(signal, N, f_s), mph)
  237.         list_of_features.append(features)
  238.     return np.array(list_of_features), np.array(list_of_labels)


  239. print("开始提取特征train:", train_signals.shape, train_labels.shape)
  240. X_train, Y_train = extract_features_labels(train_signals, train_labels, N, f_s, denominator)
  241. print("开始提取特征test:", test_signals.shape, test_labels.shape)
  242. X_test, Y_test = extract_features_labels(test_signals, test_labels, N, f_s, denominator)

  243. print("提取完毕特征train:", X_train.shape, Y_train.shape)
  244. print("提取完毕特征test:", X_test.shape, Y_test.shape)


  245. # 保存提取的特征
  246. np.save("train.npy", np.hstack([X_train, np.reshape(Y_train, newshape=[-1, 1])]))
  247. np.save("test.npy", np.hstack([X_test, np.reshape(Y_test, newshape=[-1, 1])]))




复制代码


05_all.png (517.84 KB, 下载次数: 115)

05_all.png

04_autocorr.png (57.77 KB, 下载次数: 114)

04_autocorr.png

03_psd.png (50.24 KB, 下载次数: 118)

03_psd.png

02_fft.png (61.97 KB, 下载次数: 113)

02_fft.png

01_time_domain.png (64.9 KB, 下载次数: 115)

01_time_domain.png
让天下人人学会人工智能!人工智能的前景一片大好!
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|Archiver|手机版|小黑屋|人工智能工程师的摇篮 ( 湘ICP备2020019608号-1 )

GMT+8, 2024-5-19 11:48 , Processed in 0.180641 second(s), 21 queries .

Powered by Discuz! X3.4

© 2001-2017 Comsenz Inc.

快速回复 返回顶部 返回列表