Some clarifications related to baseline results (LAMOL-m and VQG) #6

Deepayan137 · 2024-09-02T15:39:20Z

Dear Authors,

Firstly, I would like to commend you on your outstanding work. I am currently exploring the baseline methods LAMOL-m and VQG discussed in your paper and have a few questions that I hope you could clarify:

Were both LAMOL-m and VQG applied using the same architecture as SGP?
For the VQG method, was the training conducted using information maximization as detailed in the VQG paper, or was it trained using cross-entropy loss by conditioning the generation on past images and answers?
Could you elaborate on the function of object feature regression in the LAMOL method?
Is the code for these baselines available? Additionally, could you share the task-wise performance metrics for both LAMOL and VQG, specifically for the task order oarlks?
Your insights would greatly assist in advancing my current research.

Thank you for your time and assistance.

The text was updated successfully, but these errors were encountered:

StanLei52 · 2024-09-07T06:19:20Z

Hi there,

Thank you for your interest.

For LAMOL-m, we followed the arch in the original LAMOL, which is the same as the SRM model in our paper. For VQG, we used the model from VQG paper for question generation (replay), given the saved images and answers, and applied the UniVQA model for VQA tasks.
IIRC, we follow the implementation in VQG, but with an image and an answer as inputs, and the corresponding question as output (see Fig 1 in the VQG paper).
Basically to modify LAMOL for VQA tasks, we input the object features from the images, the questions and the answers to the LAMOL model. During training, we applied lm loss on the QA tokens and MSE loss on the visual object features. We add object feature regression since we find that it helps the model to replay better QAs during replay (when compared to only using lm loss on QA tokens). The potential reason might be that this regression task force the LAMOL model to better understand the image information.
I am not going to release VQG and LAMOL-m due to the limited bandwidth :( You may refer to their original repo for the impl if you are interested. For the task-wise performance, I may need some time to locate the log file and will update you through this thread.

If you have further questions, feel free to lmk.

Deepayan137 · 2024-09-07T14:35:59Z

Thank you for your prompt response. I greatly appreciate your assistance. To further our research, it would be immensely helpful if you could provide the task-wise performance data for both LAMOL-M and VQG.

Currently, we are adapting a variant of your methodology to the function benchmark. Our architecture, however, is not suited for scene-text, leading us to exclude this component. Consequently, we need to analyze the average performance across the first five tasks in the oarlks sequence specifically. This is the reason for my request for detailed task-wise performance metrics.

Your support in providing this data would be invaluable to our ongoing project.

Thank you once again for your cooperation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some clarifications related to baseline results (LAMOL-m and VQG) #6

Some clarifications related to baseline results (LAMOL-m and VQG) #6

Deepayan137 commented Sep 2, 2024

StanLei52 commented Sep 7, 2024

Deepayan137 commented Sep 7, 2024 •

edited

Loading

Some clarifications related to baseline results (LAMOL-m and VQG) #6

Some clarifications related to baseline results (LAMOL-m and VQG) #6

Comments

Deepayan137 commented Sep 2, 2024

StanLei52 commented Sep 7, 2024

Deepayan137 commented Sep 7, 2024 • edited Loading

Deepayan137 commented Sep 7, 2024 •

edited

Loading