Skip to content

ayfujii/sound_classification

 
 

Repository files navigation

Usage

Quick demo

This is sound classification demo using ThinkPad's build-in microphone.

If classification result is not shown in rqt, set hit_volume_threshold in config/sound_classification.yaml lower.

mkdir -p ~/audio_ws/src
cd ~/audio_ws/src
git clone https://github.com/708yamaguchi/sound_classification.git
cd ../
catkin build
source ~/audio_ws/devel/setup.bash
rosrun sound_classification create_dataset.py            # create dataset from spectrogram (.png files)
rosrun sound_classification train.py --gpu 0 --epoch 20  # train
roslaunch sound_classification save_noise_sound.launch   # collect environmental noise sound
roslaunch sound_classification microphone.launch         # classification on ROS

Experiment

  • Upper left: Estimated class (applause, flick, voice)
  • Left: spectrogram
  • Right: Video

Commands

  1. Download this package and catkin build.
mkdir -p ~/audio_ws/src
cd ~/audio_ws/src
git clone https://github.com/708yamaguchi/sound_classification.git
cd ../
catkin build
source ~/audio_ws/devel/setup.bash
  1. Set configs of sound classification in config/sound_classification.yaml (e.g. microphone name, sampling rate, etc). These parameters must not be changed in the following steps.

NOTE: You can get list of microphone names (and other device info) by following command.

import pyaudio
p = pyaudio.PyAudio()
for index in range(p.get_device_count()):
    print(p.get_device_info_by_index(index)['name'])
  1. Record noise sound in scripts/mean_noise_sound.npy to calibrate microphone (Spectral Subtraction method). Be quiet during this command.
roslaunch sound_classification save_noise_sound.launch
  1. Save your original spectrogram in train_data/original_spectrogram. Specify target object class as command line argument.
roslaunch sound_classification save_spectrogram.launch target_class:=(taget object class)

NOTE: You can change threshold of hitting detection by giving hit_volume_threshold argument to this roslaunch.

  1. Create dataset for training with chainer (Train dataset is augmented, but test dataset is not augmented). At the same time, mean of dataset is calculated. (saved in train_data/dataset/mean_of_dataset.png)
rosrun sound_classification create_dataset.py
  1. Visualize created dataset (train or test must be selected as an argument)
rosrun sound_classification visualize_dataset.py train
  1. Train with chainer. Results are output in scripts/result
rosrun sound_classification train.py --gpu 0 --epoch 20

NOTE: Only NIN architecture is available now.

  1. Classify spectrogram on ROS. Results are visualized in rqt.
roslaunch sound_classification microphone.launch

NOTE: If you don't have enough GPU machine, classification process will be very slow. (In my environment, GeForce 930M is enough.)

  1. Record/Play rosbag
# record
roslaunch sound_classification microphone.launch
roslaunch sound_classification record_sound_classification.launch filename:=$HOME/.ros/hoge.bag
# play
rossetlocal
roslaunch sound_classification play_sound_classification.launch filename:=$HOME/.ros/hoge.bag

Microphone

Worked on:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 84.5%
  • CMake 15.5%