Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom dataset for DONUT #319

Open
Dealibros opened this issue Oct 16, 2024 · 1 comment
Open

Custom dataset for DONUT #319

Dealibros opened this issue Oct 16, 2024 · 1 comment

Comments

@Dealibros
Copy link

Hello!

I'm just starting on my journey of model training.
I am in the process of creating a custom dataset using Ubiai for fine-tuning a DONUT model.
My goal is to extract data from forms, and I've found the necessary format structure in the documentation. However, I'm uncertain about how to handle checkboxes, which are present in the forms I plan to use. Could you advise on how to include checkboxes in the dataset? Will the DONUT model be able to accurately interpret them?

Thank you in advance for your help.

Greetings
Andrea

@paloha
Copy link

paloha commented Oct 24, 2024

It should handle check boxes without a problem - it all depends on your targets. I.e., if you have a form in your image data. The first is an open question - your target will be the full text of the answer. If the second question is a multiple answer with check boxes, you need to somehow interpret this in your targets. I did not try this explicitly, but I believe you can just train it to predict "[▢,▢,▣,▢,▣]", or "[0, 0, 1, 0, 1]", or "[2, 4]".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants