Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some clarifications related to baseline results (LAMOL-m and VQG) #6

Open
Deepayan137 opened this issue Sep 2, 2024 · 2 comments
Open

Comments

@Deepayan137
Copy link

Dear Authors,

Firstly, I would like to commend you on your outstanding work. I am currently exploring the baseline methods LAMOL-m and VQG discussed in your paper and have a few questions that I hope you could clarify:

  1. Were both LAMOL-m and VQG applied using the same architecture as SGP?
  2. For the VQG method, was the training conducted using information maximization as detailed in the VQG paper, or was it trained using cross-entropy loss by conditioning the generation on past images and answers?
  3. Could you elaborate on the function of object feature regression in the LAMOL method?
  4. Is the code for these baselines available? Additionally, could you share the task-wise performance metrics for both LAMOL and VQG, specifically for the task order oarlks?
    Your insights would greatly assist in advancing my current research.

Thank you for your time and assistance.

@StanLei52
Copy link
Collaborator

Hi there,

Thank you for your interest.

  1. For LAMOL-m, we followed the arch in the original LAMOL, which is the same as the SRM model in our paper. For VQG, we used the model from VQG paper for question generation (replay), given the saved images and answers, and applied the UniVQA model for VQA tasks.
  2. IIRC, we follow the implementation in VQG, but with an image and an answer as inputs, and the corresponding question as output (see Fig 1 in the VQG paper).
  3. Basically to modify LAMOL for VQA tasks, we input the object features from the images, the questions and the answers to the LAMOL model. During training, we applied lm loss on the QA tokens and MSE loss on the visual object features. We add object feature regression since we find that it helps the model to replay better QAs during replay (when compared to only using lm loss on QA tokens). The potential reason might be that this regression task force the LAMOL model to better understand the image information.
  4. I am not going to release VQG and LAMOL-m due to the limited bandwidth :( You may refer to their original repo for the impl if you are interested. For the task-wise performance, I may need some time to locate the log file and will update you through this thread.

If you have further questions, feel free to lmk.

@Deepayan137
Copy link
Author

Deepayan137 commented Sep 7, 2024

Thank you for your prompt response. I greatly appreciate your assistance. To further our research, it would be immensely helpful if you could provide the task-wise performance data for both LAMOL-M and VQG.

Currently, we are adapting a variant of your methodology to the function benchmark. Our architecture, however, is not suited for scene-text, leading us to exclude this component. Consequently, we need to analyze the average performance across the first five tasks in the oarlks sequence specifically. This is the reason for my request for detailed task-wise performance metrics.

Your support in providing this data would be invaluable to our ongoing project.

Thank you once again for your cooperation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants