Using Depth Embeddings in NyuV2 Zero-Shot Classification #107

Leeinsu1 · 2024-01-25T15:22:49Z

Thank you for your exceptional work and the code you've provided.
I have a question regarding the use of depth embedding in the context of NyuV2 zero-shot classification.
For the conversion of depth to disparity, I am utilizing a focal length of 518.857901 and a baseline value of 0.075.
However, the accuracy I am achieving is only 45%, which is 10% lower than what is reported in the paper.

Could you possibly advise on any additional steps that might be necessary?
Currently, I am conducting operations such as converting depth to disparity, resizing, center cropping, and normalizing.
For the normalization process, I am using mean and standard deviation values of 0.0418 and 0.0295, respectively.
Additionally, I attempted to apply DepthNorm again after converting to disparity, but it did not yield the desired results.

For the 10-th class, I am using both methods - labeling it as 'others' and selecting the class with the highest cosine similarity from the 18 specified in the paper.

Your guidance on this matter would be greatly appreciated.
Thank you.

zhang-ziang · 2024-02-21T15:25:46Z

@Leeinsu1 I encountered similar problem, could you please share the code you used for discussion? :)

jbrownkramer · 2024-03-01T22:26:56Z

I am trying to get embeddings in depth images, but I am also struggling since I have to guess at the normalization process.

@Leeinsu1 have you tried using a baseline of 75? If you look at the example disparity file from the omnivore repo, you'll see that the average value is around 16, which indicates a formula for disparity similar to 518.857901 * 75 / d, where d is depth in mm. I think then you might want to do a DepthNorm before normalizing by 0.0418 and 0.0295, since that matches the Omnivore pipeline.

That said, the mean of disparity followed by DepthNorm as defined above is probably about 10x bigger than 0.0418, so I don't know where that came from.

https://github.com/facebookresearch/omnivore/blob/1d55abdc8dfc7bd5cbf69316841ab804d0acf1ca/inference_tutorial.ipynb#L560

StanLei52 · 2024-03-02T17:33:17Z

Hi there, I recommend you to check out our project ViT-Lens. For the depth experiments, we obtained better performance over ImageBind on the same testing data. Hope that helps.

jbrownkramer · 2024-03-04T17:11:25Z

@StanLei52 Oh, that looks great! I looked at your paper and code. It seems to follow the same data normalization pipeline as Omnivore and ImageBind. One missing piece of information is the scale in the conversion from depth to disparity. The ViT-Lens code starts by loading pre-computed disparity maps, so that info is not present.

Do you know if disparity is 518.857901 * 75 / depth or 518.857901 * .075 / depth or something else?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Depth Embeddings in NyuV2 Zero-Shot Classification #107

Using Depth Embeddings in NyuV2 Zero-Shot Classification #107

Leeinsu1 commented Jan 25, 2024

zhang-ziang commented Feb 21, 2024

jbrownkramer commented Mar 1, 2024 •

edited

Loading

StanLei52 commented Mar 2, 2024

jbrownkramer commented Mar 4, 2024

Using Depth Embeddings in NyuV2 Zero-Shot Classification #107

Using Depth Embeddings in NyuV2 Zero-Shot Classification #107

Comments

Leeinsu1 commented Jan 25, 2024

zhang-ziang commented Feb 21, 2024

jbrownkramer commented Mar 1, 2024 • edited Loading

StanLei52 commented Mar 2, 2024

jbrownkramer commented Mar 4, 2024

jbrownkramer commented Mar 1, 2024 •

edited

Loading