Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GroundingSAM (GroundingDino + SAM) #1720

Merged
merged 22 commits into from
Mar 8, 2024

Conversation

pavel-esir
Copy link
Contributor

@pavel-esir pavel-esir commented Feb 16, 2024

ticket: CVS-131290

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@pavel-esir pavel-esir force-pushed the add_grounding_sam branch 2 times, most recently from ea4fa79 to a991ec2 Compare February 23, 2024 10:56
@pavel-esir pavel-esir marked this pull request as ready for review February 23, 2024 10:56
call GroundingDINO in OpenVINO

publish draft PR

GroundingDINO works from FE

added resize for GroundingDino

ready for review
@andrei-kochin andrei-kochin requested review from a team, apaniukov, itrushkin and aleksandr-mokrov and removed request for a team February 26, 2024 10:36
Copy link

review-notebook-app bot commented Feb 26, 2024

View / edit / reply to this conversation on ReviewNB

eaidova commented on 2024-02-26T15:44:27Z
----------------------------------------------------------------

Line #1.    %pip install -q "torch==2.1.0" "torchvision==0.16.0" --extra-index-url https://download.pytorch.org/whl/cpu

this code is not compatible with torch 2.2.0 and torchvision 0.17.0?

timm, transformers also have dependencies on torch and can unexplicitly install packages, please also install them with --extra

do you really need pycocotools and supervision for running notebook (looks like some training/evaluation dependencies)?


pavel-esir commented on 2024-02-27T10:38:34Z
----------------------------------------------------------------

Torch 2.2 tvision 0.17 works as well, i will remove manual install of torch and will rely on timm with extra index url.

About pycocotools it also seemed strange to me, but without it i end up having import error, because it's needed in visualizer

https://github.com/wenyi5608/GroundingDINO/blob/main/groundingdino/util/visualizer.py#L19

which is imported in file where model torch nn.Module is defined

https://github.com/wenyi5608/GroundingDINO/blob/main/groundingdino/models/GroundingDINO/groundingdino.py#L37

Supervision is needed for simpler annotation, but it can be done manually. I will do that and will remove dependency

pavel-esir commented on 2024-03-01T10:53:03Z
----------------------------------------------------------------

i have drawn masks manually in PIL but it turned out supervision is still imported inside for post processing groundingdino.util.inference.Model

i will rewrite this part and will remove supervision dependency

pavel-esir commented on 2024-03-06T23:49:04Z
----------------------------------------------------------------

Returned back supervision. With it code looks simpler.

Copy link

review-notebook-app bot commented Feb 26, 2024

View / edit / reply to this conversation on ReviewNB

eaidova commented on 2024-02-26T15:44:28Z
----------------------------------------------------------------

Line #9.        text_token_mask = torch.tensor([[

can this mask be obtained using some tool or using random? I think for real user it is hard to understand how to create and use it


pavel-esir commented on 2024-03-06T23:49:34Z
----------------------------------------------------------------

done, significantly enhanced providing example input

Copy link

review-notebook-app bot commented Feb 26, 2024

View / edit / reply to this conversation on ReviewNB

eaidova commented on 2024-02-26T15:44:29Z
----------------------------------------------------------------

please provide some explanation for this part. Which steps required to prepare data and postprocess results. There is a big function, but it maybe hard to understand what stated behind that if somebody wants to replicate this process


pavel-esir commented on 2024-03-06T23:51:07Z
----------------------------------------------------------------

i will provide description for this function as well as description with running combined GroundingDINO + SAM/EfficientSAM

pavel-esir commented on 2024-03-07T13:32:33Z
----------------------------------------------------------------

slightly refactored this function, added comments with explanations, and type annotations

Copy link

review-notebook-app bot commented Feb 26, 2024

View / edit / reply to this conversation on ReviewNB

eaidova commented on 2024-02-26T15:44:29Z
----------------------------------------------------------------

Line #1.    def predict_torch(predictor, image, transformed_boxes):

name of function confusing, do you really use pytorch for prediction?


pavel-esir commented on 2024-02-26T16:20:23Z
----------------------------------------------------------------

very obsolete name, i'll rename 🤦

Copy link

review-notebook-app bot commented Feb 26, 2024

View / edit / reply to this conversation on ReviewNB

eaidova commented on 2024-02-26T15:44:30Z
----------------------------------------------------------------

not clear, then what is above if it is run grounding sam? :)

I think more text needed, something:

Now, you can try apply grounding sam on own images using interactive demo. The code below provides helper functions used in demonstration


pavel-esir commented on 2024-03-07T14:01:44Z
----------------------------------------------------------------

added clarification for this part

Copy link

review-notebook-app bot commented Feb 26, 2024

View / edit / reply to this conversation on ReviewNB

eaidova commented on 2024-02-26T15:44:31Z
----------------------------------------------------------------

please put here little instruction how to launch (which data should be provided, what is advanced options responsibility)


pavel-esir commented on 2024-03-07T13:29:39Z
----------------------------------------------------------------

agreed and added clarifications

Copy link

review-notebook-app bot commented Feb 26, 2024

View / edit / reply to this conversation on ReviewNB

eaidova commented on 2024-02-26T15:44:32Z
----------------------------------------------------------------

Line #4.                input_image = gr.Image(source='upload', type="pil", value=f"{repo_dir}/assets/demo1.jpg", tool="sketch")

I think better to use gr. Examples for providing default images and labels

https://www.gradio.app/docs/examples


pavel-esir commented on 2024-03-06T22:57:59Z
----------------------------------------------------------------

replaced with gr.Interface and and specified examples arg, this gives UI similar to gr.Examples

Copy link
Contributor Author

very obsolete name, i'll rename 🤦


View entire conversation on ReviewNB

Copy link
Contributor Author

Torch 2.2 tvision 0.17 works as well, i will remove manual install of torch and will rely on timm with extra index url.

About pycocotools it also seemed strange to me, but without it i end up having import error, because it's needed in visualizer

https://github.com/wenyi5608/GroundingDINO/blob/main/groundingdino/util/visualizer.py#L19

which is imported in file where model torch nn.Module is defined

https://github.com/wenyi5608/GroundingDINO/blob/main/groundingdino/models/GroundingDINO/groundingdino.py#L37

Supervision is needed for simpler annotation, but it can be done manually. I will do that and will remove dependency


View entire conversation on ReviewNB

Copy link

review-notebook-app bot commented Feb 27, 2024

View / edit / reply to this conversation on ReviewNB

aleksandr-mokrov commented on 2024-02-27T23:58:27Z
----------------------------------------------------------------

Remove outputs with no useful information and with internal paths


pavel-esir commented on 2024-03-07T13:28:32Z
----------------------------------------------------------------

removed

Copy link

review-notebook-app bot commented Feb 27, 2024

View / edit / reply to this conversation on ReviewNB

aleksandr-mokrov commented on 2024-02-27T23:58:27Z
----------------------------------------------------------------

Line #2.        from groundingdino.models import build_model

Why are the import inside functions? All these function always are called.


pavel-esir commented on 2024-03-06T23:01:09Z
----------------------------------------------------------------

moved them outside

Copy link

review-notebook-app bot commented Feb 27, 2024

View / edit / reply to this conversation on ReviewNB

aleksandr-mokrov commented on 2024-02-27T23:58:28Z
----------------------------------------------------------------

Cleanup


pavel-esir commented on 2024-03-06T23:01:56Z
----------------------------------------------------------------

done

Copy link

review-notebook-app bot commented Feb 28, 2024

View / edit / reply to this conversation on ReviewNB

apaniukov commented on 2024-02-28T13:08:50Z
----------------------------------------------------------------

Line #7.        token_type_ids = torch.tensor([[0, 0, 0, 0, 0, 0]])

The same is for attention_mask and token_type_ids, they are already returned by tokenizer:

>>> pt_grounding_dino_model.tokenizer([caption], return_tensors="pt", return_special_tokens_mask=True)

{'input_ids': tensor([[ 101, 1996, 2770, 3899, 1012, 102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1]]), 'special_tokens_mask': tensor([[1, 0, 0, 0, 0, 1]])}

special_token_mask can be transformed into text_token_mask like this:

_, counts = torch.unique_consecutive(text_token_mask, return_counts=True)
text_token_mask = torch.block_diag(*[torch.ones(size, size) for size in counts])



pavel-esir commented on 2024-03-06T23:03:23Z
----------------------------------------------------------------

thanks a lot. Will apply this

pavel-esir commented on 2024-03-06T23:19:59Z
----------------------------------------------------------------

done

Copy link
Contributor Author

i will provide description for this function as well as description with running combined GroundingDINO + SAM/EfficientSAM


View entire conversation on ReviewNB

Copy link
Contributor Author

removed


View entire conversation on ReviewNB

Copy link
Contributor Author

removed


View entire conversation on ReviewNB

Copy link
Contributor Author

agreed and added clarifications


View entire conversation on ReviewNB

Copy link
Contributor Author

slightly refactored this function, added comments with explanations, and type annotatetions


View entire conversation on ReviewNB

Copy link
Contributor Author

added clarification for this part


View entire conversation on ReviewNB

@pavel-esir
Copy link
Contributor Author

Applied comments. @eaidova @apaniukov @aleksandr-mokrov please take a look

@pavel-esir
Copy link
Contributor Author

compilation on mac fails. i will disable testing on mac

Copy link

review-notebook-app bot commented Mar 8, 2024

View / edit / reply to this conversation on ReviewNB

eaidova commented on 2024-03-08T05:07:08Z
----------------------------------------------------------------

Line #2.    ov_compiled_grounded_dino = core.compile_model(ov_dino_model, device.upper())

no need in upper


pavel-esir commented on 2024-03-08T07:39:55Z
----------------------------------------------------------------

done

Copy link

review-notebook-app bot commented Mar 8, 2024

View / edit / reply to this conversation on ReviewNB

eaidova commented on 2024-03-08T05:12:00Z
----------------------------------------------------------------

I think this text does not describe cell content, maybe better to add it into code as comment with additional description for other steps like disabling gradients and model tracing as well?


pavel-esir commented on 2024-03-08T07:40:03Z
----------------------------------------------------------------

done

@eaidova
Copy link
Collaborator

eaidova commented Mar 8, 2024

@pavel-esir could you please check correctness of notebook meta for docs?

image

I'm not sure that visual question answering is correct task for this notebook, also not sure that it is possible to run it using binder (binder has only 2GB RAM, are you sure that it is enough for running your notebook? do you try to run it?)

Please also add image for preview

Copy link
Contributor Author

done


View entire conversation on ReviewNB

Copy link
Contributor Author

done


View entire conversation on ReviewNB

@pavel-esir
Copy link
Contributor Author

@added link to image, removed visual question answering, applied.

not sure that it is possible to run it using binder (binder has only 2GB RAM, are you sure that it is enough for running your notebook? do you try to run it?)

checked, indeed it consumes too much ram > 5Gb, removed binder link

@eaidova eaidova merged commit 376bade into openvinotoolkit:main Mar 8, 2024
13 of 15 checks passed
eaidova pushed a commit that referenced this pull request Mar 8, 2024
update image for GroundedSAM
#1720
@pavel-esir pavel-esir deleted the add_grounding_sam branch March 11, 2024 07:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants