Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Step wise guidelines #128

Open
Reethuch opened this issue Mar 1, 2023 · 6 comments
Open

Step wise guidelines #128

Reethuch opened this issue Mar 1, 2023 · 6 comments

Comments

@Reethuch
Copy link

Reethuch commented Mar 1, 2023

Hi team,
This is awesome. I want to recognize few set of key words from live audio and print them. I am planning to use my own training dataset. I don't understand what the flow is. What inputs to give(i mean arguments)? and what is the expected output.

@mlxu995
Copy link
Collaborator

mlxu995 commented Mar 11, 2023

The step wise guideline will come soon, but for now you can just refer to other examples to prepare your own data.
The following is the key steps:

  1. prepare a "wav.scp" file where each line contains the wav_id and its corresponding path, as shown below:
first.wav /path/to/first.wav
second.wav /path/to/second.wav
  1. prepare a "text" file where each line contains the wav_id and its corresponding label, assume that first.wav contains your keyword, and second.wav does not:
first.wav 0
second.wav -1
  1. use the script in wav_to_duration.sh to get the "wav.dur" file, and then run tools/make_list.py
bash tools/wav_to_duration.sh /path/to/wav.scp /path/to/wav.dur
python tools/make_list.py /path/to/wav.scp /path/to/text /path/to/wav.dur /path/to/data.list

@Reethuch
Copy link
Author

Thank you @mlxu995. I generated model using google command speech dataset and your code. But how to use the model to recognize live words?

Also during stage 4,
I got some warnings like this, it that alright?
**_```
/Users/reethu/Desktop/wekws/wekws/model/mdtc.py:110: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert outputs.size(2) > self.padding
[W NNPACK.cpp:53] Could not initialize NNPACK! Reason: Unsupported hardware.
/Users/reethu/Desktop/wekws/wekws/model/mdtc.py:257: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if in_cache.size(0) > 0:
/Users/reethu/Desktop/wekws/wekws/model/mdtc.py:187: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if in_cache.size(0) > 0:
Export to onnx succeed, but pytorch/onnx have different
outputs when given the same input, please check!!!

@mlxu995
Copy link
Collaborator

mlxu995 commented Mar 19, 2023

For the last message, you can try to set the atol=1e-5 (in export_onnx.py). Note that the mdtc has a bigger error range than ds-tcn because it's finally output is a summation of the output of multi layers.
Other warnings can be just ignored.

@mlxu995
Copy link
Collaborator

mlxu995 commented Mar 19, 2023

To use the model to recognize live words, you can follow this guidelines (https://github.com/wenet-e2e/wekws/blob/main/runtime/android/README.md)

@Reethuch
Copy link
Author

Thank you @mlxu995
I need to do audio detection from web browser.
Is that possible ?

@duj12
Copy link
Contributor

duj12 commented Sep 25, 2023

Thank you @mlxu995 I need to do audio detection from web browser. Is that possible ?

@Reethuch You can try this web demo. https://www.modelscope.cn/studios/thuduj12/KWS_Nihao_Xiaojing/summary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants