Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training doesn't start #317

Open
Psoriaz opened this issue Oct 7, 2024 · 4 comments
Open

Training doesn't start #317

Psoriaz opened this issue Oct 7, 2024 · 4 comments

Comments

@Psoriaz
Copy link

Psoriaz commented Oct 7, 2024

I'm trying to train a model on a custom dataset. The training begins, but stops at the moment of generating the dataset. The last lines that are printed to the console:

Resolving data files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2117/2117 [00:00<00:00, 42802.73it/s] Resolving data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 605/605 [00:00<00:00, 46075.35it/s] Resolving data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 304/304 [00:00<00:00, 55909.34it/s] Downloading data: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2119/2119 [00:00<00:00, 38616.95files/s] Downloading data: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 607/607 [00:00<00:00, 45081.68files/s] Downloading data: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 306/306 [00:00<00:00, 52478.11files/s] Generating train split: 2116 examples [00:00, 9723.71 examples/s] Generating validation split: 604 examples [00:00, 17335.75 examples/s] Generating test split: 303 examples [00:00, 16679.23 examples/s]

What should I do? How to understand where exactly the problem is?

@Rakshith12-pixel
Copy link

any updates???? im stuck with the same issue

@Psoriaz
Copy link
Author

Psoriaz commented Dec 16, 2024

The problem has been resolved.

Steps taken:

  1. Created a virtual environment
  2. Cloned the repository
  3. I went to the donut directory and wrote pip install .
  4. Downgraded the version of transformers to 4.18.0, timm to 0.5.4, tokenizers to 0.12.1, torch to 2.0.0+cu117

Now everything works fine!

@Rakshith12-pixel
Copy link

ERROR: Could not find a version that satisfies the requirement torch==2.0.0+cu117 (from versions: 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.4.0, 2.4.1, 2.5.0, 2.5.1)
which one shuld be used?
THanks

@Rakshith12-pixel
Copy link

Anyways, i used the versions you mentioned, kept torch at Version: 2.3.0 and then had to downgrade protobuf to 3.20.x and the training has started. Thanks @Psoriaz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants