Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about difference between code and paper #16

Closed
chrissyguo opened this issue Jun 1, 2021 · 8 comments
Closed

Questions about difference between code and paper #16

chrissyguo opened this issue Jun 1, 2021 · 8 comments
Labels
in-depth Deep and valuable discussion paper Question from paper

Comments

@chrissyguo
Copy link

Question 1:
In def forward_single(self, x) of ~mmdet/dense_heads/MIAOD_retina_head.py​:
y_head_cls = y_head_f_mil.softmax(2) * y_head_cls_term2.sigmoid().max(2, keepdim=True)[0].softmax(1)
It seems that this code (with max(2, keepdim=True)[0]) is not in accordance with the equation (5) in the paper:
Screenshot from 2021-06-01 10-31-07
What is the function of max(2, keepdim=True)[0]? And why softmax(1) after all computation? I thought the code should be:
y_head_cls = (y_head_f_mil.softmax(2)) * (y_head_cls_term2.sigmoid().softmax(1))according to my understanding.

Question 2:
For uncertainty calculation definition in the paper:

The instance uncertainty is defined as the prediction discrepancy of f_1 and f_2.

and the equation for discrepancy is:

Screenshot from 2021-06-01 11-10-35

After reweighting, the equation has changed to:
Screenshot from 2021-06-01 11-10-01

I thought the code of uncertainty for loss computation and the code of uncertainty for data selection should be the same.
However, the code in def l_wave_dis of ~mmdet/dense_heads/MIAOD_head.py​:

l_det_cls_all = (abs(y_head_f_1_single - y_head_f_2_single) * w_i.reshape(-1, self.C)).mean(dim=1).sum() * self.param_lambda

directly compute the discrepancy by subtraction (Manhattan Distance?), while the code in def calculate_uncertainty of ~mmdet/apis/test.py​ is:

loss_l2_p = (y_head_f_1 - y_head_f_2).pow(2)

which uses pow(2) for square computation (Euclidean Distance?). I would like to know why you have such designs in code.

Thank you so much!

@yuantn
Copy link
Owner

yuantn commented Jun 3, 2021

Thanks for your questions.

Answer 1:
The function of max(2, keepdim=True)[0] is to highlight the class with the highest score (i.e., to make a [2, 68400 (anchor number), 20 (class number)] tensor to a [2, 68400, 1] tensor). By using this maximum function, we only focus on the classes and bounding boxes that are most likely to be predicted as the foreground, and that is the goal of multiple instance learning. In fact, this is an auxiliary effect for the first term in this equation.

The effect of softmax(1) function is the same as that in the first term. Its effect is:
softmax

Answer 2:

There are indeed differences between the l_wave_dis in the paper and that in the code.

Computing the discrepancy by subtraction directly (called Manhattan Distance in Digital Image Processing but L1 loss in Deep Learning here) and by square (called Euclidean Distance in Digital Image Processing but L2 loss in Deep Learning here) are two ways to calculate loss. Their effects are basically the same but minorly different.

In addition, as shown in our previous experiments (which are not provided now), there is not much differences in performance between using these two types of loss. So we did not unify them in the paper and code before.

I will re-run the code using L1 loss, and enclose the performance in this issue for comparison to prove my inference above. Please wait a few days.

@yuantn
Copy link
Owner

yuantn commented Jun 9, 2021

Here are the results using L1 loss in Answer 2 and those in the paper:

Proportion (%) of Labeled Images 5.0 7.5 10.0 12.5 15.0 17.5 20.0
mAP (%) using L1 loss 46.84 59.36 63.22 67.36 69.33 70.90 71.48
mAP (%) in the paper 47.18 58.41 64.02 67.72 69.79 71.07 72.27
mAP (%) of the differences -0.34 0.95 -0.8 -0.36 -0.46 -0.17 -0.79

Here is the output log: Google Drive | Baidu Drive (Extraction Code: 27ql)

@liaorongfan
Copy link

In tools/trains.py the code minimizes the uncertainty first and then to maximizing the uncertainty, which is not in line with the paper. I was wondering if whether there were some special concerns.

@yuantn
Copy link
Owner

yuantn commented Jun 11, 2021

In tools/trains.py the code minimizes the uncertainty first and then to maximizing the uncertainty, which is not in line with the paper. I was wondering if whether there were some special concerns.

Please kindly refer to here for explanation.

@liaorongfan
Copy link

@yuantn No mean for offences, just curious, sorry

@yuantn
Copy link
Owner

yuantn commented Jun 12, 2021

@yuantn No mean for offences, just curious, sorry

No need to be so cautious... I'm OK! 😁

I answered like that just because someone has asked your question before.

Feel free to ask any questions you want, but it would be better to open another issue if your question is different from the current issue.

@liaorongfan
Copy link

thanks a lot

@yuantn yuantn added in-depth Deep and valuable discussion paper Question from paper labels Jun 21, 2021
@chrissyguo
Copy link
Author

Thank you so much for answering!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in-depth Deep and valuable discussion paper Question from paper
Projects
None yet
Development

No branches or pull requests

3 participants