error on training #14

urmikakasi · 2022-01-14T04:51:51Z

This is the output from training- the model is not getting saved due to a callback issue.

4:6:34: Using Inceptionv3 model
{}: Generating image features using inceptionv3 model...
2022-01-14 04:06:34.692740: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/inception_v3/inception_v3_weights_tf_dim_ordering_tf_kernels.h5
96116736/96112376 [==============================] - 1s 0us/step
96124928/96112376 [==============================] - 1s 0us/step
100% 8091/8091 [28:33<00:00, 4.72it/s]
4:35:12: Completed & Saved features for 8091 images successfully
4:35:12: Parsing captions file...
4:35:12: Parsed captions: 40460
4:35:12: Parsed & Saved successfully
4:35:12: Available images for training: 6000
4:35:12: Available captions for training: 30000
4:35:13: Available images for validation: 1000
4:35:13: Available captions for validation: 5000
RNN Model (Decoder) Summary :
Model: "model_1"

Layer (type) Output Shape Param # Connected to

input_3 (InputLayer) [(None, 40)] 0 []

input_2 (InputLayer) [(None, 2048)] 0 []

embedding (Embedding) (None, 40, 300) 2213400 ['input_3[0][0]']

dense (Dense) (None, 300) 614700 ['input_2[0][0]']

lstm (LSTM) (None, 40, 256) 570368 ['embedding[0][0]']

repeat_vector (RepeatVector) (None, 40, 300) 0 ['dense[0][0]']

time_distributed (TimeDistribu (None, 40, 300) 77100 ['lstm[0][0]']
ted)

concatenate_2 (Concatenate) (None, 40, 600) 0 ['repeat_vector[0][0]',
'time_distributed[0][0]']

bidirectional (Bidirectional) (None, 512) 1755136 ['concatenate_2[0][0]']

dense_2 (Dense) (None, 7378) 3784914 ['bidirectional[0][0]']

==================================================================================================
Total params: 9,015,618
Trainable params: 9,015,618
Non-trainable params: 0

None
steps_train: 94, steps_val: 16
Batch Size: 64
Total Number of Epochs = 20
train_val.py:86: UserWarning: Model.fit_generator is deprecated and will be removed in a future version. Please use Model.fit, which supports generators.
verbose=1)
Epoch 1/20
Traceback (most recent call last):
File "train_val.py", line 86, in
verbose=1)
File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 2030, in fit_generator
initial_epoch=initial_epoch)
File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Matrix size-incompatible: In[0]: [3732,1000], In[1]: [2048,300]
[[node model_1/dense/Relu
(defined at /usr/local/lib/python3.7/dist-packages/keras/backend.py:4867)
]] [Op:__inference_train_function_569695]

Errors may have originated from an input operation.
Input Source operations connected to node model_1/dense/Relu:
In[0] model_1/dense/BiasAdd (defined at /usr/local/lib/python3.7/dist-packages/keras/layers/core/dense.py:210)

Operation defined at: (most recent call last)

File "train_val.py", line 86, in
verbose=1)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 2030, in fit_generator
initial_epoch=initial_epoch)

File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1216, in fit
tmp_logs = self.train_function(iterator)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 878, in train_function
return step_function(self, iterator)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 867, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))

File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 860, in run_step
outputs = model.train_step(data)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 808, in train_step
y_pred = self(x, training=True)

File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py", line 1083, in call
outputs = call_fn(inputs, *args, **kwargs)

File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 92, in error_handler
return fn(*args, **kwargs)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py", line 452, in call
inputs, training=training, mask=mask)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py", line 589, in _run_internal_graph
outputs = node.layer(*args, **kwargs)

File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)

File "/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py", line 1083, in call
outputs = call_fn(inputs, *args, **kwargs)

File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 92, in error_handler
return fn(*args, **kwargs)

File "/usr/local/lib/python3.7/dist-packages/keras/layers/core/dense.py", line 213, in call
outputs = self.activation(outputs)

File "/usr/local/lib/python3.7/dist-packages/keras/activations.py", line 311, in relu
return backend.relu(x, alpha=alpha, max_value=max_value, threshold=threshold)

File "/usr/local/lib/python3.7/dist-packages/keras/backend.py", line 4867, in relu
x = tf.nn.relu(x)

2022-01-14 04:35:25.329300: W tensorflow/core/kernels/data/generator_dataset_op.cc:107] Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized. The process may be terminated.
[[{{node PyFunc}}]]

The text was updated successfully, but these errors were encountered:

hibah321 · 2022-10-13T16:51:28Z

I've got the same error. Can you help if you've resolved it?

Anant-mishra1729 · 2022-11-14T07:59:37Z

Got the same error. Help needed!

kunal0230 · 2024-11-05T10:32:01Z

Matrix Size Mismatch:
The shape mismatch [3732,1000] vs. [2048,300] likely comes from connecting the InceptionV3 output to the RNN decoder. Double-check that InceptionV3 outputs [None, 2048] for each image, then add a Dense layer or Reshape layer to align it with the decoder’s input.

Callback and Model Saving Issue:
Since fit_generator is deprecated, switch to fit, which should work seamlessly with data generators. Also, make sure ModelCheckpoint or other callbacks are properly configured and compatible with TensorFlow’s current version, especially for saving the model.

CUDA Warning (No GPU Detected):
This message means training is running on CPU instead of GPU. If a GPU is available, use tf.config.list_physical_devices('GPU') to verify its accessibility, or check your environment settings to ensure TensorFlow can detect it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error on training #14

error on training #14

urmikakasi commented Jan 14, 2022

hibah321 commented Oct 13, 2022

Anant-mishra1729 commented Nov 14, 2022

kunal0230 commented Nov 5, 2024

error on training #14

error on training #14

Comments

urmikakasi commented Jan 14, 2022

Layer (type) Output Shape Param # Connected to

hibah321 commented Oct 13, 2022

Anant-mishra1729 commented Nov 14, 2022

kunal0230 commented Nov 5, 2024