You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Yes, the current code only supports click-based 3D bounding box outputs, but we will release an update next week that includes support for purely language-guided 3D visual grounding tasks. Currently the code does not officially support 3D Visual Grounding Task, which requires the extra grounding head to achieve the accurate grounding results. We’ve tried simply output the 3D bounding box of object in text or location token format in the 3D VG cases, and found that it does not work well~
Hello. Could you please advise me on how to properly train a model for 3D VG on ScanRefer: model, losses, dataset, metrics?
Your current model can predict bounding boxes only as text and only with an additional click on the object, if I understood everything correctly.
The text was updated successfully, but these errors were encountered: