Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redimnet #355

Closed
freshpearYoon opened this issue Aug 28, 2024 · 6 comments
Closed

Redimnet #355

freshpearYoon opened this issue Aug 28, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@freshpearYoon
Copy link

I can see Redimnet in your code (redimnet.py), but there is no mention aboutr redimnet in pretrained model readme.md or docs.
Is it able to use redimnet?

@wsstriving
Copy link
Collaborator

The configuration and code have been added, but the pretrained model is not currently supported. I only have initial results, and we may update the pretrained models in the future. For now, you can try training with different configurations.

@freshpearYoon
Copy link
Author

Thank you for your reply!

@didi1233
Copy link

didi1233 commented Aug 28, 2024

Hello @wsstriving ,

Thank you for providing the training code for Redimnet. While training the Redimnet model, I encountered the following error:

`.................................
[ INFO : 2024-08-28 18:33:36,699 ] - (pool): ASTP(
[ INFO : 2024-08-28 18:33:36,699 ] - (linear1): Conv1d(2592, 128, kernel_size=(1,), stride=(1,))
[ INFO : 2024-08-28 18:33:36,699 ] - (linear2): Conv1d(128, 864, kernel_size=(1,), stride=(1,))
[ INFO : 2024-08-28 18:33:36,699 ] - )
[ INFO : 2024-08-28 18:33:36,700 ] - (seg_1): Linear(in_features=1728, out_features=192, bias=True)
[ INFO : 2024-08-28 18:33:36,700 ] - (seg_bn_1): Identity()
[ INFO : 2024-08-28 18:33:36,700 ] - (seg_2): Identity()
[ INFO : 2024-08-28 18:33:36,700 ] - (projection): ArcMarginProduct(
[ INFO : 2024-08-28 18:33:36,700 ] - in_features=192, out_features=2, scale=32.0,
[ INFO : 2024-08-28 18:33:36,700 ] - margin=0.0, easy_margin=False
[ INFO : 2024-08-28 18:33:36,700 ] - )
[ INFO : 2024-08-28 18:33:36,700 ] - )
Traceback (most recent call last):
File "wespeaker/bin/train.py", line 257, in
fire.Fire(train)
File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "wespeaker/bin/train.py", line 156, in train
script_model = torch.jit.script(model)
File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/jit/_script.py", line 1286, in script
return torch.jit._recursive.create_script_module(
File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/jit/_recursive.py", line 476, in create_script_module
return create_script_module_impl(nn_module, concrete_type, stubs_fn)
File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/jit/_recursive.py", line 538, in create_script_module_impl
script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn)
File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/jit/_script.py", line 615, in _construct
init_fn(script_module)
File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/jit/_recursive.py", line 516, in init_fn
scripted = create_script_module_impl(orig_value, sub_concrete_type, stubs_fn)
File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/jit/_recursive.py", line 538, in create_script_module_impl
script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn)
File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/jit/_script.py", line 615, in _construct
init_fn(script_module)
File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/jit/_recursive.py", line 516, in init_fn
scripted = create_script_module_impl(orig_value, sub_concrete_type, stubs_fn)
File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/jit/_recursive.py", line 538, in create_script_module_impl
script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn)
File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/jit/_script.py", line 615, in _construct
init_fn(script_module)
File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/jit/_recursive.py", line 516, in init_fn
scripted = create_script_module_impl(orig_value, sub_concrete_type, stubs_fn)
File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/jit/_recursive.py", line 542, in create_script_module_impl
create_methods_and_properties_from_stubs(concrete_type, method_stubs, property_stubs)
File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/jit/_recursive.py", line 393, in create_methods_and_properties_from_stubs
concrete_type._create_methods_and_properties(property_defs, property_rcbs, method_defs, method_rcbs, method_defaults)
RuntimeError:
cannot statically infer the expected size of a list in this context:
File "/home/wespeaker-master/wespeaker/models/redimnet.py", line 50
def forward(self, x):
size = x.size()
bs, c, f, t = tuple(size)
~~~~~~~~~~ <--- HERE
return x.permute((0, 2, 1, 3)).reshape((bs, c * f, t))

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 9961) of binary: /home/anaconda3/envs/3D/bin/python
Traceback (most recent call last):
File "/home/anaconda3/envs/3D/bin/torchrun", line 8, in
sys.exit(main())
File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/distributed/run.py", line 762, in main
run(args)
File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ..............
`

However, after commenting out lines 155 to 157 in https://github.com/wenet-e2e/wespeaker/blob/master/wespeaker/bin/train.py (which skips exporting the init.zip file), the model was able to train successfully.

Could you please explain what it means when exporting init.zip fails? Does this indicate that the Redimnet model cannot be exported to ONNX as well?

Thank you!

@wsstriving wsstriving added bug Something isn't working labels Aug 28, 2024
@wsstriving
Copy link
Collaborator

Hello @wsstriving ,

Thank you for providing the training code for Redimnet. While training the Redimnet model, I encountered the following error:

`................................. [ INFO : 2024-08-28 18:33:36,699 ] - (pool): ASTP( [ INFO : 2024-08-28 18:33:36,699 ] - (linear1): Conv1d(2592, 128, kernel_size=(1,), stride=(1,)) [ INFO : 2024-08-28 18:33:36,699 ] - (linear2): Conv1d(128, 864, kernel_size=(1,), stride=(1,)) [ INFO : 2024-08-28 18:33:36,699 ] - ) [ INFO : 2024-08-28 18:33:36,700 ] - (seg_1): Linear(in_features=1728, out_features=192, bias=True) [ INFO : 2024-08-28 18:33:36,700 ] - (seg_bn_1): Identity() [ INFO : 2024-08-28 18:33:36,700 ] - (seg_2): Identity() [ INFO : 2024-08-28 18:33:36,700 ] - (projection): ArcMarginProduct( [ INFO : 2024-08-28 18:33:36,700 ] - in_features=192, out_features=2, scale=32.0, [ INFO : 2024-08-28 18:33:36,700 ] - margin=0.0, easy_margin=False [ INFO : 2024-08-28 18:33:36,700 ] - ) [ INFO : 2024-08-28 18:33:36,700 ] - ) Traceback (most recent call last): File "wespeaker/bin/train.py", line 257, in fire.Fire(train) File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "wespeaker/bin/train.py", line 156, in train script_model = torch.jit.script(model) File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/jit/_script.py", line 1286, in script return torch.jit._recursive.create_script_module( File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/jit/_recursive.py", line 476, in create_script_module return create_script_module_impl(nn_module, concrete_type, stubs_fn) File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/jit/_recursive.py", line 538, in create_script_module_impl script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn) File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/jit/_script.py", line 615, in _construct init_fn(script_module) File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/jit/_recursive.py", line 516, in init_fn scripted = create_script_module_impl(orig_value, sub_concrete_type, stubs_fn) File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/jit/_recursive.py", line 538, in create_script_module_impl script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn) File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/jit/_script.py", line 615, in _construct init_fn(script_module) File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/jit/_recursive.py", line 516, in init_fn scripted = create_script_module_impl(orig_value, sub_concrete_type, stubs_fn) File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/jit/_recursive.py", line 538, in create_script_module_impl script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn) File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/jit/_script.py", line 615, in _construct init_fn(script_module) File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/jit/_recursive.py", line 516, in init_fn scripted = create_script_module_impl(orig_value, sub_concrete_type, stubs_fn) File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/jit/_recursive.py", line 542, in create_script_module_impl create_methods_and_properties_from_stubs(concrete_type, method_stubs, property_stubs) File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/jit/_recursive.py", line 393, in create_methods_and_properties_from_stubs concrete_type._create_methods_and_properties(property_defs, property_rcbs, method_defs, method_rcbs, method_defaults) RuntimeError: cannot statically infer the expected size of a list in this context: File "/home/wespeaker-master/wespeaker/models/redimnet.py", line 50 def forward(self, x): size = x.size() bs, c, f, t = tuple(size) ~~~~~~~~~~ <--- HERE return x.permute((0, 2, 1, 3)).reshape((bs, c * f, t))

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 9961) of binary: /home/anaconda3/envs/3D/bin/python Traceback (most recent call last): File "/home/anaconda3/envs/3D/bin/torchrun", line 8, in sys.exit(main()) File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(*args, **kwargs) File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/distributed/run.py", line 762, in main run(args) File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run elastic_launch( File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/anaconda3/envs/3D/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: .............. `

However, after commenting out lines 155 to 157 in https://github.com/wenet-e2e/wespeaker/blob/master/wespeaker/bin/train.py (which skips exporting the init.zip file), the model was able to train successfully.

Could you please explain what it means when exporting init.zip fails? Does this indicate that the Redimnet model cannot be exported to ONNX as well?

Thank you!

I forgot to mention that the current implementation does not meet the requirement for jit export. You can simply comment out the JIT check code to run the model for now. We will address this issue later.

@vikcost
Copy link
Contributor

vikcost commented Nov 6, 2024

Hi, is there any progress/plans on supporting finetuning from official checkpoints?
I'm interested in fine-tuning from redimnet-b6 checkpoint.

At initialization, I see the following warnings. This clearly indicates some missing or unexpected parameters.

[ WARNING : 2024-11-06 04:39:59,808 ] - missing tensor: seg_1.weight
[ WARNING : 2024-11-06 04:39:59,808 ] - missing tensor: seg_1.bias
[ WARNING : 2024-11-06 04:39:59,808 ] - missing tensor: projection.weight
[ WARNING : 2024-11-06 04:39:59,808 ] - unexpected tensor: spec.torchfbank.1.flipped_filter
[ WARNING : 2024-11-06 04:39:59,808 ] - unexpected tensor: spec.torchfbank.2.spectrogram.window
[ WARNING : 2024-11-06 04:39:59,808 ] - unexpected tensor: spec.torchfbank.2.mel_scale.fb
[ WARNING : 2024-11-06 04:39:59,808 ] - unexpected tensor: bn.weight
[ WARNING : 2024-11-06 04:39:59,808 ] - unexpected tensor: bn.bias
[ WARNING : 2024-11-06 04:39:59,809 ] - unexpected tensor: bn.running_mean
[ WARNING : 2024-11-06 04:39:59,809 ] - unexpected tensor: bn.running_var
[ WARNING : 2024-11-06 04:39:59,809 ] - unexpected tensor: bn.num_batches_tracked
[ WARNING : 2024-11-06 04:39:59,809 ] - unexpected tensor: linear.weight
[ WARNING : 2024-11-06 04:39:59,809 ] - unexpected tensor: linear.bias

@MonolithFoundation
Copy link

Hi, is there a workable pretraind redimnet model can be used ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants