-
Notifications
You must be signed in to change notification settings - Fork 829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GroundingSAM (GroundingDino + SAM) #1720
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
ea4fa79
to
a991ec2
Compare
a991ec2
to
3624e91
Compare
View / edit / reply to this conversation on ReviewNB eaidova commented on 2024-02-26T15:44:27Z Line #1. %pip install -q "torch==2.1.0" "torchvision==0.16.0" --extra-index-url https://download.pytorch.org/whl/cpu this code is not compatible with torch 2.2.0 and torchvision 0.17.0?
timm, transformers also have dependencies on torch and can unexplicitly install packages, please also install them with --extra
do you really need pycocotools and supervision for running notebook (looks like some training/evaluation dependencies)? pavel-esir commented on 2024-02-27T10:38:34Z Torch 2.2 tvision 0.17 works as well, i will remove manual install of torch and will rely on timm with extra index url.
About pycocotools it also seemed strange to me, but without it i end up having import error, because it's needed in visualizer https://github.com/wenyi5608/GroundingDINO/blob/main/groundingdino/util/visualizer.py#L19 which is imported in file where model torch nn.Module is defined
Supervision is needed for simpler annotation, but it can be done manually. I will do that and will remove dependency pavel-esir commented on 2024-03-01T10:53:03Z i have drawn masks manually in PIL but it turned out supervision is still imported inside for post processing i will rewrite this part and will remove supervision dependency pavel-esir commented on 2024-03-06T23:49:04Z Returned back supervision. With it code looks simpler. |
View / edit / reply to this conversation on ReviewNB eaidova commented on 2024-02-26T15:44:28Z Line #9. text_token_mask = torch.tensor([[ can this mask be obtained using some tool or using random? I think for real user it is hard to understand how to create and use it pavel-esir commented on 2024-03-06T23:49:34Z done, significantly enhanced providing example input |
View / edit / reply to this conversation on ReviewNB eaidova commented on 2024-02-26T15:44:29Z please provide some explanation for this part. Which steps required to prepare data and postprocess results. There is a big function, but it maybe hard to understand what stated behind that if somebody wants to replicate this process pavel-esir commented on 2024-03-06T23:51:07Z i will provide description for this function as well as description with running combined GroundingDINO + SAM/EfficientSAM pavel-esir commented on 2024-03-07T13:32:33Z slightly refactored this function, added comments with explanations, and type annotations |
View / edit / reply to this conversation on ReviewNB eaidova commented on 2024-02-26T15:44:29Z Line #1. def predict_torch(predictor, image, transformed_boxes): name of function confusing, do you really use pytorch for prediction? pavel-esir commented on 2024-02-26T16:20:23Z very obsolete name, i'll rename 🤦 |
View / edit / reply to this conversation on ReviewNB eaidova commented on 2024-02-26T15:44:30Z not clear, then what is above if it is run grounding sam? :)
I think more text needed, something:
Now, you can try apply grounding sam on own images using interactive demo. The code below provides helper functions used in demonstration pavel-esir commented on 2024-03-07T14:01:44Z added clarification for this part |
View / edit / reply to this conversation on ReviewNB eaidova commented on 2024-02-26T15:44:31Z please put here little instruction how to launch (which data should be provided, what is advanced options responsibility) pavel-esir commented on 2024-03-07T13:29:39Z agreed and added clarifications |
View / edit / reply to this conversation on ReviewNB eaidova commented on 2024-02-26T15:44:32Z Line #4. input_image = gr.Image(source='upload', type="pil", value=f"{repo_dir}/assets/demo1.jpg", tool="sketch") I think better to use gr. Examples for providing default images and labels https://www.gradio.app/docs/examples pavel-esir commented on 2024-03-06T22:57:59Z replaced with gr.Interface and and specified examples arg, this gives UI similar to gr.Examples |
very obsolete name, i'll rename 🤦 View entire conversation on ReviewNB |
Torch 2.2 tvision 0.17 works as well, i will remove manual install of torch and will rely on timm with extra index url.
About pycocotools it also seemed strange to me, but without it i end up having import error, because it's needed in visualizer https://github.com/wenyi5608/GroundingDINO/blob/main/groundingdino/util/visualizer.py#L19 which is imported in file where model torch nn.Module is defined
Supervision is needed for simpler annotation, but it can be done manually. I will do that and will remove dependency View entire conversation on ReviewNB |
View / edit / reply to this conversation on ReviewNB aleksandr-mokrov commented on 2024-02-27T23:58:27Z Remove outputs with no useful information and with internal paths pavel-esir commented on 2024-03-07T13:28:32Z removed |
View / edit / reply to this conversation on ReviewNB aleksandr-mokrov commented on 2024-02-27T23:58:27Z Line #2. from groundingdino.models import build_model Why are the import inside functions? All these function always are called. pavel-esir commented on 2024-03-06T23:01:09Z moved them outside |
View / edit / reply to this conversation on ReviewNB aleksandr-mokrov commented on 2024-02-27T23:58:28Z Cleanup pavel-esir commented on 2024-03-06T23:01:56Z done |
7269f44
to
43b4154
Compare
View / edit / reply to this conversation on ReviewNB apaniukov commented on 2024-02-28T13:08:50Z Line #7. token_type_ids = torch.tensor([[0, 0, 0, 0, 0, 0]]) The same is for >>> pt_grounding_dino_model.tokenizer([caption], return_tensors="pt", return_special_tokens_mask=True)
_, counts = torch.unique_consecutive(text_token_mask, return_counts=True)
pavel-esir commented on 2024-03-06T23:03:23Z thanks a lot. Will apply this pavel-esir commented on 2024-03-06T23:19:59Z done |
i will provide description for this function as well as description with running combined GroundingDINO + SAM/EfficientSAM View entire conversation on ReviewNB |
removed View entire conversation on ReviewNB |
removed View entire conversation on ReviewNB |
agreed and added clarifications View entire conversation on ReviewNB |
slightly refactored this function, added comments with explanations, and type annotatetions View entire conversation on ReviewNB |
added clarification for this part View entire conversation on ReviewNB |
Applied comments. @eaidova @apaniukov @aleksandr-mokrov please take a look |
compilation on mac fails. i will disable testing on mac |
View / edit / reply to this conversation on ReviewNB eaidova commented on 2024-03-08T05:07:08Z Line #2. ov_compiled_grounded_dino = core.compile_model(ov_dino_model, device.upper()) no need in upper pavel-esir commented on 2024-03-08T07:39:55Z done |
View / edit / reply to this conversation on ReviewNB eaidova commented on 2024-03-08T05:12:00Z I think this text does not describe cell content, maybe better to add it into code as comment with additional description for other steps like disabling gradients and model tracing as well?
pavel-esir commented on 2024-03-08T07:40:03Z done |
@pavel-esir could you please check correctness of notebook meta for docs? I'm not sure that visual question answering is correct task for this notebook, also not sure that it is possible to run it using binder (binder has only 2GB RAM, are you sure that it is enough for running your notebook? do you try to run it?) Please also add image for preview |
done View entire conversation on ReviewNB |
done View entire conversation on ReviewNB |
@added link to image, removed visual question answering, applied.
checked, indeed it consumes too much ram > 5Gb, removed binder link |
update image for GroundedSAM #1720
ticket: CVS-131290