-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error on training #14
Comments
I've got the same error. Can you help if you've resolved it? |
Got the same error. Help needed! |
Matrix Size Mismatch: Callback and Model Saving Issue: CUDA Warning (No GPU Detected): |
This is the output from training- the model is not getting saved due to a callback issue.
4:6:34: Using Inceptionv3 model
{}: Generating image features using inceptionv3 model...
2022-01-14 04:06:34.692740: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/inception_v3/inception_v3_weights_tf_dim_ordering_tf_kernels.h5
96116736/96112376 [==============================] - 1s 0us/step
96124928/96112376 [==============================] - 1s 0us/step
100% 8091/8091 [28:33<00:00, 4.72it/s]
4:35:12: Completed & Saved features for 8091 images successfully
4:35:12: Parsing captions file...
4:35:12: Parsed captions: 40460
4:35:12: Parsed & Saved successfully
4:35:12: Available images for training: 6000
4:35:12: Available captions for training: 30000
4:35:13: Available images for validation: 1000
4:35:13: Available captions for validation: 5000
RNN Model (Decoder) Summary :
Model: "model_1"
Layer (type) Output Shape Param # Connected to
input_3 (InputLayer) [(None, 40)] 0 []
input_2 (InputLayer) [(None, 2048)] 0 []
embedding (Embedding) (None, 40, 300) 2213400 ['input_3[0][0]']
dense (Dense) (None, 300) 614700 ['input_2[0][0]']
lstm (LSTM) (None, 40, 256) 570368 ['embedding[0][0]']
repeat_vector (RepeatVector) (None, 40, 300) 0 ['dense[0][0]']
time_distributed (TimeDistribu (None, 40, 300) 77100 ['lstm[0][0]']
ted)
concatenate_2 (Concatenate) (None, 40, 600) 0 ['repeat_vector[0][0]',
'time_distributed[0][0]']
bidirectional (Bidirectional) (None, 512) 1755136 ['concatenate_2[0][0]']
dense_2 (Dense) (None, 7378) 3784914 ['bidirectional[0][0]']
==================================================================================================
Total params: 9,015,618
Trainable params: 9,015,618
Non-trainable params: 0
None
steps_train: 94, steps_val: 16
Batch Size: 64
Total Number of Epochs = 20
train_val.py:86: UserWarning:
Model.fit_generator
is deprecated and will be removed in a future version. Please useModel.fit
, which supports generators.verbose=1)
Epoch 1/20
Traceback (most recent call last):
File "train_val.py", line 86, in
verbose=1)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 2030, in fit_generator
initial_epoch=initial_epoch)
File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Matrix size-incompatible: In[0]: [3732,1000], In[1]: [2048,300]
[[node model_1/dense/Relu
(defined at /usr/local/lib/python3.7/dist-packages/keras/backend.py:4867)
]] [Op:__inference_train_function_569695]
Errors may have originated from an input operation.
Input Source operations connected to node model_1/dense/Relu:
In[0] model_1/dense/BiasAdd (defined at /usr/local/lib/python3.7/dist-packages/keras/layers/core/dense.py:210)
Operation defined at: (most recent call last)
2022-01-14 04:35:25.329300: W tensorflow/core/kernels/data/generator_dataset_op.cc:107] Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized. The process may be terminated.
[[{{node PyFunc}}]]
The text was updated successfully, but these errors were encountered: