Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core50 result #1

Open
jiaolifengmi opened this issue Oct 21, 2022 · 16 comments
Open

Core50 result #1

jiaolifengmi opened this issue Oct 21, 2022 · 16 comments

Comments

@jiaolifengmi
Copy link

flow<compare_all.py --experiment=CORe50 --n-seeds=10 --seed=11 --single-epochs --batch=1 --fc-layers=2 --z-dim=200 --fc-units=1024 --lr=0.0001 --c=10 --lambda=10 --omega-max=0.1 --ar1-c=1. --dg-prop=0. --bir-c=0.01 --si-dg-prop=0.6>

According to the code provided in the file, the result of running Core50 dataset is inconsistent with that in the article( BIR Table2)

@GMvandeVen
Copy link
Owner

Hi jiaolifengmi, thank you for the feedback. Could you provide a bit more details? Then I’ll look into it. For example, what are the results that you get when you run this comment?

@jiaolifengmi
Copy link
Author

flow the command.sh <compare_all.py --experiment=CORe50 --n-seeds=10 --seed=11 --single-epochs --batch=1 --fc-layers=2 --z-dim=200 --fc-units=1024 --lr=0.0001 --c=10 --lambda=10 --omega-max=0.1 --ar1-c=1. --dg-prop=0. --bir-c=0.01 --si-dg-prop=0.6> the results of BIR is not 60.40(±1.04)(Results provided in Table 2 of the article).
my results:
the results of all seeds is [0.48647837 0.56542501 0.50891047 0.55057535 0.52782375 0.53963085
0.46688718 0.56751322 0.438259 0.52388038],The average value is 0.5175383566172511

@jiaolifengmi
Copy link
Author

Hi GMvandeVen, I'm sorry to bother you again, but I still want to ask you what caused the inconsistent results displayed on the Core50 dataset? Is there a problem with my operation mode?

@GMvandeVen
Copy link
Owner

Hi jiaolifengmi, I'm sorry for the slow reply. I still need to look at this in more detail. I already checked whether perhaps I made a typo in the results table, but that doesn't seem to be the case. I'll let you know once I figured it out.

@jiaolifengmi
Copy link
Author

Hi GMvandeVen, Thank you again for your reply and look forward to your results

@GMvandeVen
Copy link
Owner

GMvandeVen commented Nov 8, 2022

EDIT: the issue described in the answer below is not what caused these inconsistencies. Please see a later answer (#1 (comment)) for an explanation and fix of this issue.


Hi jiaolifengmi, sorry it took a while to figure it out, but I expect that the difference between your results for BI-R and the results reported in the paper is due to the use of a different version of a pre-trained ResNet18 to extract the CORe50 features.
The pre-trained ResNet18 is selected in this line of code:

feature_extractor = resnet18(pretrained=True)

With the pre-trained ResNet18 from the version of torchvision that I used for the experiments reported in the paper, I consistently get results similar to those reported in the paper. However, if I use the pre-trained ResNet18 from the latest version of torchvision, I indeed get results similar to those you mention. I have experienced before that the performance of generative replay of latent features is quite sensitive to feature extractor that is used (presumably because for some features it is easier to learn a generative model of than for others), although I am a bit surprised that the difference can be this much, especially as both models are pre-trained versions of ResNet18 on ImageNet. At least this variability in the performance of BI-R does not change the interpretation of the experiments reported in the paper.

Hope this helps!

@jiaolifengmi
Copy link
Author

Hi GMvandeVen, Thank you again for your reply. I'm sorry to bother you again.I don't know whether you have noticed the performance of BIR on CIFAR100 dataset. I ran it according to the provided code, but I didn't get the results provided in the paper. Besides, the features of CIFAR100 are extracted from the provided pre-trained model, but the results are inconsistent. In addition, are there two wrong parameters in this line of code? Or is the setting reversed?(--bir-c/--si-dg-prob)

flow the command.sh <compare_all.py --experiment=CIFAR100 --pre-convE --hidden --iters=5000 --n-seeds=10 --seed=11 --c=1. --lambda=100. --omega-max=0.01 --ar1-c=100 --dg-prop=0.7 --bir-c=0.6 --si-dg-prop=100000000

the results of BIR is not 21.51 (± 0.25)(Results provided in Table 2 of the article).
my results:
the results of all seeds is [0.1342 0.172 0.1502 0.1511 0.1454 0.1565 0.1451 0.128 0.1045 0.0892],The average value is 0.13765

@GMvandeVen
Copy link
Owner

Yes, you are right, that is a mistake. The values of --bir-c and --si-dg-prop should be reversed. Sorry about that! I just changed it in the code. Does this fix it?

@jiaolifengmi
Copy link
Author

However, these two parameters do not affect the BIR results on CIFAR100, but affect the BIR+SI results. Even if these two parameters are replaced, the results of BIR+SI on CIFAR100 are not consistent with those in the paper. Is this also because of the pre-trained model?

@GMvandeVen
Copy link
Owner

Sorry, I didn't read your entire question correctly. Hmm, no, if you use the provided pre-trained convolutional layers than those should be the same as I used, so that can't explain such a difference. I'm starting to think that perhaps there is a mistake in the code for BI-R, maybe it got introduced when I cleaned the code. I will try to investigate. Sorry about those issues!

@GMvandeVen
Copy link
Owner

In the mean time, for BI-R you could also use this repository: https://github.com/GMvandeVen/brain-inspired-replay
I realize it might be inconvenient as there are some differences between these two repositories, but that code for BI-R should be working correctly. (If not, please let me know!)

@jiaolifengmi
Copy link
Author

I'm sorry, maybe I didn't make it clear. The problem now is to run BIR with the provided code. Whether it is Core50 or CIFAR100, the results are inconsistent with those provided in the paper. Because of the inconsistency on the Core50 dataset, your explanation is that the version of the pre training model is inconsistent. But since the features of CIFAR100 are extracted from the model provided, it is strange that I also get inconsistent results with those provided in the paper. I hope you can verify the results of BIR algorithm on these two datasets again. Thank you again!

@GMvandeVen
Copy link
Owner

GMvandeVen commented Nov 10, 2022

Based on the description of your results, I now expect it is possible there is a mistake in the code for BI-R in this repository. This mistake might indeed also explain the difference on the CORe50 dataset (it is indeed surprising that a different version of ImageNet pre-trained model would make such a difference). Using my own, uncleaned version of this code I get results similar to those reported in the paper, so perhaps a mistake got introduced when cleaning the code. I will try to investigate.

The same mistake would therefore likely not be in the implementation of BI-R in this repository: https://github.com/GMvandeVen/brain-inspired-replay, also as that repository has already been more thoroughly used and tested by others.

@GMvandeVen
Copy link
Owner

Hi jiaolifengmi, my apologies that it took so long, but I have finally found the error that caused the replay methods to run incorrectly when using the compare_all.py-script. The error was that in this script, all replay methods were erroneously combined with the method "CWR-plus", because the args.cwr_plus flag was never turned off. I have now fixed this by adding this line:

args.cwr_plus = False

This should fix the inconsistencies you encountered. I did some quick checks and it seems to work fine for me now. Please let me know if you still encounter any issues.

Many thanks for raising this issue!

@jiaolifengmi
Copy link
Author

jiaolifengmi commented Nov 27, 2022 via email

@GMvandeVen
Copy link
Owner

GMvandeVen commented Nov 30, 2022

Hi jiaolifengmi, thank you for pointing that out. You are right that there still seems to be an issue with the BI-R and BI-R+SI results if you run this script. I really should have tested this script better, my apologies. I have tried to find what is wrong but so far I haven't been able to find it. When cleaning up the code for this repository I must have changed something. If you use the code in this repository, you can get results for BI-R and BI-R+SI on the class-incremental version of Split CIFAR-100 that are consistent with the results reported in the paper (if you use the provided pre-trained convolutional layers, which are the same ones as in this repository; you can do this with the following command: ./compare_CIFAR100.py --scenario=class --seed=11 --n-seeds=10).

I will keep trying to find out what is wrong with the implementation of BI-R in this repository; I will let you know if I find it. For now, I added a note to the README to explain this issue. Sorry for the inconvenience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants