Epic: Image Classifier #1

nelsonic · 2023-05-25T09:41:45Z

Once we are uploading images dwyl/imgup#51
We want to classify the images and suggest meta tags to describe the images so that they become "searchable".
That means pulling any text out of images using OCR.
And attempting to find any detail in images that can be useful.

We aren't going build our own models from scratch. but we are going to ...

Todo

Research the available models, services/APIs we can use to send an image that classify images
Research available OCR services or models.
- If there is an Open Source OCR model we can run on our own infra e.g. €20/month on Fly.io share!!
Images that are uploaded from a Camera or Smart Phone contain metadata including camera type/model, location (where the photo was taken), ISO, Shutter, Focal Length, Original Resolution, etc. We want to capture this and feed it into the classifier. Feat: Store Metadata and Image Classification/Info #3
The objective of the classifier is to attempt to describe the image and return a few keywords.
If it makes more sense to have this as a standalone app (separate from imgup) then feel free to create a new repo! Then just send the data to the standalone app and receive JSON data in response. 💭

@LuchoTurtle please leave comments with your research. 🙏

Context

We want to be able to upload images in our App and have them become an item of content.
i.e. I take a photo of a messy kitchen and it becomes "Tidy The Kitchen" with a small thumbnail of the image.
If I tap on the thumbnail I see the full-screen. But the Text is the important part.

The reason we want to have a "Visual Todo List" is that it becomes easy for people who don't yet read (think toddlers) or people who don't read well (people who only have basic literacy) to follow instructions.

The text was updated successfully, but these errors were encountered:

LuchoTurtle · 2023-06-19T09:02:24Z

Stumbled upon these two, which might be relevant to revisit at a later stage:
https://github.com/bentoml/OpenLLM
https://github.com/showlab/Image2Paragraph

nelsonic · 2023-06-19T09:08:56Z

Yeah, saw OpenLLM on HN this morning:

https://news.ycombinator.com/item?id=36388219
Looks good. BentoML is what OpenAI could have been but they chose to go closed (MSFT) ... 🙄

LuchoTurtle · 2023-09-12T18:09:20Z

I've thought about what would be the best way of doing this and I've found a fair share of resources that I think may help get something close to what we want.

Image Captioning models

Most common open-source LLMs, such as Llama2 or Claude2, only receive text input. I took a gander at https://github.com/bentoml/OpenLLM, as I've stated in the comment above. However, it's not really useful to us as these LLms do not understand image inputs (though maybe some of these can understand vectorial representations of images). Therefore, we have to forgo these more "mainstream" LLMs for this use case.

There are, however, models pertaining to computer vision we can definitely use. I started my dive in https://github.com/salesforce/LAVIS#image-captioning, which led to me discovering BLIP-2, a zero-shot image-to-text generation model that we can use for image captioning.

I'm not going to explain how BLIP-2 works but you can find more info about it at https://huggingface.co/blog/blip-2. The good thing about it is that it's available in Hugging Face Transformers, which we can easily use to download and run BLIP-2 as a pre-trained model quite easily, even if it's just for testing purposes.

You can find a demo at https://huggingface.co/spaces/Salesforce/BLIP2.

Langchain 🦜

I had heard about Langchain several times for a few months, and how it makes it easy to create LLM-based applications, and chain different models together to yield a given output for a person for whatever use case. And the fact that you can easily deploy it to fly.io is a big plus.

I was thinking of using BLIP-2 and chaining it to an open-source LLM like Llama 2 or the others, to get a more descriptive caption of the image, so we could extract keywords afterwards.

`Image2Paragraph`

However, I realised that I was doing something similar to Image2Paragraph, which does something similar to this, but with the added capabilities of two models: GRIT and Segment Anything, which provide contextual descriptions of images. The output of all three models (BLIP-2, GRIT, `Segment Anything) are later fed to an LLM (GPT, in this case) to generate a text paragraph describing the image.

Here's how the pipeline works:

So what to use?

You should give Image2Paragraph a whirl (I already tried on Hugging Faces but it's not working https://huggingface.co/spaces/Awiny/Image2Paragraph) but I don't see a clear way of using it to receive an image URL and output the paragraph and deploy this on fly.io. If I can only have this on localhost, there's no point in pursuing this.

So I wonder if only using BLIP-2 or using vit-gpt2-image-captioning models from HuggingFace is easier and more "doable" for what we want.

(The latter seems like a highly plausible option using transformers. See https://ankur3107.github.io/blogs/the-illustrated-image-captioning-using-transformers/).

nelsonic · 2023-09-13T01:59:56Z

Good research/summary. Thanks. 👌

LuchoTurtle · 2023-09-19T08:39:26Z

As @nelsonic suggested, we can give https://github.com/elixir-image/image a whirl, as well.

nelsonic · 2023-09-19T08:44:45Z

@LuchoTurtle I've lowered the priority on this issue to reflect the fact that it's a very "nice to have" feature but isn't "core" to the experience of our App for the time being. We need to focus on the WYSIWYG editor and getting the "core" functionality done and then shipping the Flutter App to the App Store ASAP. ⏳

Ref: dwyl/product-roadmap#40 we need to work on the Flutter App as our exclusive focus until we have feature parity with the Elixir/Phoenix MVP. I want to be using the Flutter App on my phone ASAP. 🙏

nelsonic · 2023-09-19T08:47:31Z

Having said that, when you take "breaks" from the Flutter work and want to do research for image classifying, please do it. I know that AI/ML is an area of interest/focus for you so definitely research and capture what you learn. 🔍 🧑‍💻 ✍️ ✅

nelsonic · 2023-09-19T08:49:05Z

It will be an awesome enhancement to add image recognition to the images people upload in the Flutter App.
But if we don't yet have a Flutter App deployed to the App Store dwyl/app#342 or Google Play dwyl/app#346 we are a "Default Dead" company.

nelsonic · 2023-10-25T08:44:13Z

@LuchoTurtle given that we are BLOCKED on both iOS App Store dwyl/app#342 (comment) and Google Play dwyl/app#346 both assigned to @iteles 🔥
Please take a look at this issue today.
We should create a new repo for it: https://github.com/dwyl/image-classifier 🆕 ✅
Feel free to use Python for it if you think you can do it faster. 🐍
Otherwise if you can use Elixir, it will be easier for us to maintain longer-term. 💧

nelsonic added the enhancement New feature or enhancement of existing functionality label May 25, 2023

nelsonic assigned LuchoTurtle May 25, 2023

LuchoTurtle mentioned this issue Oct 1, 2023

Building AI Apps with Elixir - Charlie Holtz - ElixirConf 2023 dwyl/learn-elixir#212

Open

nelsonic mentioned this issue Oct 2, 2023

EPIC: tidy App Minimum Viable Features for Life Organising Tool dwyl/tidy#1

Open

5 tasks

ndrean mentioned this issue Oct 17, 2023

Testing Image-To-Text dwyl/imgup#131

Open

nelsonic transferred this issue from dwyl/imgup Oct 25, 2023

LuchoTurtle added a commit that referenced this issue Oct 30, 2023

feat: Initial commit. #1

105b434

LuchoTurtle added a commit that referenced this issue Oct 30, 2023

chore: Adding rest of initial setup and README. #1

c8acd88

LuchoTurtle added a commit that referenced this issue Oct 30, 2023

chore: Starting out. #1

c355232

LuchoTurtle added a commit that referenced this issue Oct 30, 2023

chore: Initial setup. #1

6fcc31c

LuchoTurtle added a commit that referenced this issue Oct 30, 2023

chore: Adding LiveView. #1

da92870

LuchoTurtle added a commit that referenced this issue Oct 30, 2023

feat: Automatically consuming entries. #1

35de4c9

LuchoTurtle mentioned this issue Oct 30, 2023

[PR] Creating basic image classifier with Elixir #2

Merged

LuchoTurtle moved this from 🔖 Ready for Development to ⏳Awaiting Review in dwyl app kanban Nov 8, 2023

nelsonic added a commit that referenced this issue Nov 13, 2023

fix typos and minor updates #1

c1c4dcf

nelsonic closed this as completed in #2 Nov 13, 2023

github-project-automation bot moved this from ⏳Awaiting Review to ✅ Done in dwyl app kanban Nov 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic: Image Classifier #1

Epic: Image Classifier #1

nelsonic commented May 25, 2023 •

edited by LuchoTurtle

Loading

LuchoTurtle commented Jun 19, 2023

nelsonic commented Jun 19, 2023

LuchoTurtle commented Sep 12, 2023

nelsonic commented Sep 13, 2023

LuchoTurtle commented Sep 19, 2023

nelsonic commented Sep 19, 2023 •

edited

Loading

nelsonic commented Sep 19, 2023

nelsonic commented Sep 19, 2023 •

edited

Loading

nelsonic commented Oct 25, 2023

Epic: Image Classifier #1

Epic: Image Classifier #1

Comments

nelsonic commented May 25, 2023 • edited by LuchoTurtle Loading

Todo

Context

LuchoTurtle commented Jun 19, 2023

nelsonic commented Jun 19, 2023

LuchoTurtle commented Sep 12, 2023

Image Captioning models

Langchain 🦜

Image2Paragraph

So what to use?

nelsonic commented Sep 13, 2023

LuchoTurtle commented Sep 19, 2023

nelsonic commented Sep 19, 2023 • edited Loading

nelsonic commented Sep 19, 2023

nelsonic commented Sep 19, 2023 • edited Loading

nelsonic commented Oct 25, 2023

nelsonic commented May 25, 2023 •

edited by LuchoTurtle

Loading

`Image2Paragraph`

nelsonic commented Sep 19, 2023 •

edited

Loading

nelsonic commented Sep 19, 2023 •

edited

Loading