Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error on training #14

Open
urmikakasi opened this issue Jan 14, 2022 · 3 comments
Open

error on training #14

urmikakasi opened this issue Jan 14, 2022 · 3 comments

Comments

@urmikakasi
Copy link

This is the output from training- the model is not getting saved due to a callback issue.

4:6:34: Using Inceptionv3 model
{}: Generating image features using inceptionv3 model...
2022-01-14 04:06:34.692740: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/inception_v3/inception_v3_weights_tf_dim_ordering_tf_kernels.h5
96116736/96112376 [==============================] - 1s 0us/step
96124928/96112376 [==============================] - 1s 0us/step
100% 8091/8091 [28:33<00:00, 4.72it/s]
4:35:12: Completed & Saved features for 8091 images successfully
4:35:12: Parsing captions file...
4:35:12: Parsed captions: 40460
4:35:12: Parsed & Saved successfully
4:35:12: Available images for training: 6000
4:35:12: Available captions for training: 30000
4:35:13: Available images for validation: 1000
4:35:13: Available captions for validation: 5000
RNN Model (Decoder) Summary :
Model: "model_1"


Layer (type) Output Shape Param # Connected to

input_3 (InputLayer) [(None, 40)] 0 []

input_2 (InputLayer) [(None, 2048)] 0 []

embedding (Embedding) (None, 40, 300) 2213400 ['input_3[0][0]']

dense (Dense) (None, 300) 614700 ['input_2[0][0]']

lstm (LSTM) (None, 40, 256) 570368 ['embedding[0][0]']

repeat_vector (RepeatVector) (None, 40, 300) 0 ['dense[0][0]']

time_distributed (TimeDistribu (None, 40, 300) 77100 ['lstm[0][0]']
ted)

concatenate_2 (Concatenate) (None, 40, 600) 0 ['repeat_vector[0][0]',
'time_distributed[0][0]']

bidirectional (Bidirectional) (None, 512) 1755136 ['concatenate_2[0][0]']

dense_2 (Dense) (None, 7378) 3784914 ['bidirectional[0][0]']

==================================================================================================
Total params: 9,015,618
Trainable params: 9,015,618
Non-trainable params: 0


None
steps_train: 94, steps_val: 16
Batch Size: 64
Total Number of Epochs = 20
train_val.py:86: UserWarning: Model.fit_generator is deprecated and will be removed in a future version. Please use Model.fit, which supports generators.
verbose=1)
Epoch 1/20
Traceback (most recent call last):
File "train_val.py", line 86, in
verbose=1)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 2030, in fit_generator
initial_epoch=initial_epoch)
File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Matrix size-incompatible: In[0]: [3732,1000], In[1]: [2048,300]
[[node model_1/dense/Relu
(defined at /usr/local/lib/python3.7/dist-packages/keras/backend.py:4867)
]] [Op:__inference_train_function_569695]

Errors may have originated from an input operation.
Input Source operations connected to node model_1/dense/Relu:
In[0] model_1/dense/BiasAdd (defined at /usr/local/lib/python3.7/dist-packages/keras/layers/core/dense.py:210)

Operation defined at: (most recent call last)

File "train_val.py", line 86, in
verbose=1)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 2030, in fit_generator
initial_epoch=initial_epoch)

File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1216, in fit
tmp_logs = self.train_function(iterator)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 878, in train_function
return step_function(self, iterator)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 867, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))

File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 860, in run_step
outputs = model.train_step(data)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 808, in train_step
y_pred = self(x, training=True)

File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py", line 1083, in call
outputs = call_fn(inputs, *args, **kwargs)

File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 92, in error_handler
return fn(*args, **kwargs)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py", line 452, in call
inputs, training=training, mask=mask)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py", line 589, in _run_internal_graph
outputs = node.layer(*args, **kwargs)

File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py", line 1083, in call
outputs = call_fn(inputs, *args, **kwargs)

File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 92, in error_handler
return fn(*args, **kwargs)

File "/usr/local/lib/python3.7/dist-packages/keras/layers/core/dense.py", line 213, in call
outputs = self.activation(outputs)

File "/usr/local/lib/python3.7/dist-packages/keras/activations.py", line 311, in relu
return backend.relu(x, alpha=alpha, max_value=max_value, threshold=threshold)

File "/usr/local/lib/python3.7/dist-packages/keras/backend.py", line 4867, in relu
x = tf.nn.relu(x)

2022-01-14 04:35:25.329300: W tensorflow/core/kernels/data/generator_dataset_op.cc:107] Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized. The process may be terminated.
[[{{node PyFunc}}]]

@hibah321
Copy link

I've got the same error. Can you help if you've resolved it?

@Anant-mishra1729
Copy link

Got the same error. Help needed!

@kunal0230
Copy link

Matrix Size Mismatch:
The shape mismatch [3732,1000] vs. [2048,300] likely comes from connecting the InceptionV3 output to the RNN decoder. Double-check that InceptionV3 outputs [None, 2048] for each image, then add a Dense layer or Reshape layer to align it with the decoder’s input.

Callback and Model Saving Issue:
Since fit_generator is deprecated, switch to fit, which should work seamlessly with data generators. Also, make sure ModelCheckpoint or other callbacks are properly configured and compatible with TensorFlow’s current version, especially for saving the model.

CUDA Warning (No GPU Detected):
This message means training is running on CPU instead of GPU. If a GPU is available, use tf.config.list_physical_devices('GPU') to verify its accessibility, or check your environment settings to ensure TensorFlow can detect it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants