This is sound classification demo using ThinkPad's build-in microphone.
If classification result is not shown in rqt, set hit_volume_threshold
in config/sound_classification.yaml
lower.
mkdir -p ~/audio_ws/src
cd ~/audio_ws/src
git clone https://github.com/708yamaguchi/sound_classification.git
cd ../
catkin build
source ~/audio_ws/devel/setup.bash
rosrun sound_classification create_dataset.py # create dataset from spectrogram (.png files)
rosrun sound_classification train.py --gpu 0 --epoch 20 # train
roslaunch sound_classification save_noise_sound.launch # collect environmental noise sound
roslaunch sound_classification microphone.launch # classification on ROS
- Upper left: Estimated class (applause, flick, voice)
- Left: spectrogram
- Right: Video
- Download this package and catkin build.
mkdir -p ~/audio_ws/src
cd ~/audio_ws/src
git clone https://github.com/708yamaguchi/sound_classification.git
cd ../
catkin build
source ~/audio_ws/devel/setup.bash
- Set configs of sound classification in
config/sound_classification.yaml
(e.g. microphone name, sampling rate, etc). These parameters must not be changed in the following steps.
NOTE: You can get list of microphone names (and other device info) by following command.
import pyaudio
p = pyaudio.PyAudio()
for index in range(p.get_device_count()):
print(p.get_device_info_by_index(index)['name'])
- Record noise sound in
scripts/mean_noise_sound.npy
to calibrate microphone (Spectral Subtraction method). Be quiet during this command.
roslaunch sound_classification save_noise_sound.launch
- Save your original spectrogram in
train_data/original_spectrogram
. Specify target object class as command line argument.
roslaunch sound_classification save_spectrogram.launch target_class:=(taget object class)
NOTE: You can change threshold of hitting detection by giving hit_volume_threshold
argument to this roslaunch.
- Create dataset for training with chainer (Train dataset is augmented, but test dataset is not augmented). At the same time, mean of dataset is calculated. (saved in
train_data/dataset/mean_of_dataset.png
)
rosrun sound_classification create_dataset.py
- Visualize created dataset (
train
ortest
must be selected as an argument)
rosrun sound_classification visualize_dataset.py train
- Train with chainer. Results are output in
scripts/result
rosrun sound_classification train.py --gpu 0 --epoch 20
NOTE: Only NIN
architecture is available now.
- Classify spectrogram on ROS. Results are visualized in rqt.
roslaunch sound_classification microphone.launch
NOTE: If you don't have enough GPU machine, classification process will be very slow. (In my environment, GeForce 930M is enough.)
- Record/Play rosbag
# record
roslaunch sound_classification microphone.launch
roslaunch sound_classification record_sound_classification.launch filename:=$HOME/.ros/hoge.bag
# play
rossetlocal
roslaunch sound_classification play_sound_classification.launch filename:=$HOME/.ros/hoge.bag
Worked on:
- ThinkPad T460s build-in microphone
- MINI Microphone (http://akizukidenshi.com/catalog/g/gM-12864/)