Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancy Between Local and Online Transcription Results in Transkun #21

Open
Mofan0418 opened this issue Nov 30, 2024 · 3 comments
Open

Comments

@Mofan0418
Copy link

Hello, I am a music industry professional with a keen interest in programming. The model you trained has been incredibly helpful to me in music transcription. Thank you so much for your effort.

I have a question. Previously, I used this online platform(https://colab.research.google.com/drive/1Eo_cfCfPVPFpQANuxQBQS6T68kc-tNXD?usp=sharing#scrollTo=O5oJ2O0gSfGa) to transcribe audio files into MIDI. However, I recently encountered a quota limit (haha). So, I visited GitHub, downloaded the package you provided, and deployed Transkun locally (version 2.0.1, I believe).

I noticed that the transcription results obtained locally are not as good as those from the online platform (though still better than other models on the market). I was wondering, is the package you released on GitHub identical to the one used on the online platform? What might be causing the difference in results? Thank you!

@Yujia-Yan
Copy link
Owner

The latest version of the PIP package and this github repo are synchronized. Can you be more specific about what kind of "discrepancy" it is?

Do you mean note duration extension?: Compared with the previous release (v0.1.4), there is a significant behavior change: the default checkpoint's behavior, now, is not to extend note durations according to pedal activations. If transcribing notes with pedal extension is the desired behavior, use "Transkun V2 Aug" as in Model Cards.

@Mofan0418
Copy link
Author

Mofan0418 commented Nov 30, 2024

Thank you for your response!
1
2
To make it clearer, I’ve attached images to illustrate. Image 1 shows the results obtained using the online model on Google, while Image 2 shows the results from running the model locally. Although I can’t directly articulate the differences between the two, based on auditory perception, the online model’s results are superior in several aspects: the completeness of the notes (no omissions), the accuracy of rhythm, the precision in handling long and short notes, and the decision-making on pedal usage. Overall, the online model provides better results in terms of listening experience.
I ran the model locally several times and obtained identical results each time, which demonstrates that the output is consistent as long as the model remains unchanged.
So, I was wondering if the difference in results might be due to the two using different models. Or perhaps some parameters of the model have been fine-tuned? The online model is shown as follows:
3

When running locally, I encountered the following warnings. Could you please confirm if they would impact the results? (I personally believe they would not.)

4

/Users/mofan/Documents/pythonProject/venv/bin/python /Users/mofan/Documents/pythonProject/PianoTrans/mp3_to_midi_gui.py
请选择需要转录的音频文件(MP3 格式)...
2024-11-30 16:11:35.735 Python[49081:2853634] +[IMKClient subclass]: chose IMKClient_Modern
2024-11-30 16:11:35.938 Python[49081:2853634] The class 'NSOpenPanel' overrides the method identifier. This method is implemented by class 'NSWindow'
正在转录:/Users/mofan/Desktop/Summer - 久石譲 Piano Ver. [Full].mp3
/Users/mofan/Documents/pythonProject/venv/lib/python3.9/site-packages/transkun/transcribe.py:49: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
checkpoint = torch.load(path, map_location = device)
/Users/mofan/Documents/pythonProject/venv/lib/python3.9/site-packages/torch/_dynamo/eval_frame.py:632: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.5 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
return fn(*args, **kwargs)
/Users/mofan/Documents/pythonProject/venv/lib/python3.9/site-packages/torch/utils/checkpoint.py:87: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn(
转录完成!MIDI 文件已保存至:/Users/mofan/Desktop/Summer - 久石譲 Piano Ver. [Full].mid

@Yujia-Yan
Copy link
Owner

Yujia-Yan commented Nov 30, 2024

Could you provide the audio file, the two midi files you mentioned, and the content of "PianoTrans/mp3_to_midi_gui.py" so that I can investigate what's happening when you call it locally?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants