Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image support inside complex types #1767

Open
isaacbmiller opened this issue Nov 6, 2024 · 6 comments · May be fixed by #1801
Open

Image support inside complex types #1767

isaacbmiller opened this issue Nov 6, 2024 · 6 comments · May be fixed by #1801
Assignees

Comments

@isaacbmiller
Copy link
Collaborator

Currently, only you can only pass a single image at a time in a signature.

E.g. this will work

class ImageSignature(dspy.Signature):
    image1: dspy.Image = dspy.InputField()
    image2: dspy.Image = dspy.InputField()

But any more complex types involving images wont:

class ImageSignature(dspy.Signature):
    images: List[dspy.Image] = dspy.InputField()

class ImageSignature(dspy.Signature):
    labeled_images: Dict[str, dspy.Image] = dspy.InputField()

This is due to how images are compiled into OAI compatible messages, where inside chat_adapter.py we create a large list of content blocks by giving fields with an image_url special privileges:

{
    "content": [{
         "type": "text",
         "text": "...",
    },
    {
         "type": "image_url"
         "image_url": {"url": "..."} # url is either an actual url or the base64 data
    }]
}

I do some fairly naive parsing inside ChatAdapter, and there is definitely a more elegant solution here.
#1763 addresses the List case, but I want a more generalized solution.

cc @okhat

@isaacbmiller isaacbmiller self-assigned this Nov 6, 2024
@thomasahle
Copy link
Collaborator

This is how I did it in fewshot:

def format_input_simple(pydantic_object: BaseModel, img_formatter=None) -> dict[str, Any]:
    if img_formatter is None:
        img_formatter = gpt_format_image

    image_map = {}

    def replace_image_with_id(obj: Any) -> Any:
        image_id = f"[image {len(image_map) + 1}]"
        image_map[image_id] = obj.base64()
        return image_id

    dict_obj = map_images(pydantic_object, replace_image_with_id)
    processed = json.dumps(dict_obj)

    content = [{"type": "text", "text": processed}]
    for image_id, image in image_map.items():
        content.append({"type": "text", "text": image_id + ":"})
        content.append(img_formatter(image))

    return {"role": "user", "content": content}

Basically when I turn the input object into json, I replace all images with an ID.
Then at the end of the message I send the list of (ID, img) pairs.

Works reasonably well.

@rzr2kor
Copy link

rzr2kor commented Nov 8, 2024

Currently, only you can only pass a single image at a time in a signature.

E.g. this will work

class ImageSignature(dspy.Signature):
    image1: dspy.Image = dspy.InputField()
    image2: dspy.Image = dspy.InputField()

But any more complex types involving images wont:

class ImageSignature(dspy.Signature):
    images: List[dspy.Image] = dspy.InputField()

class ImageSignature(dspy.Signature):
    labeled_images: Dict[str, dspy.Image] = dspy.InputField()

This is due to how images are compiled into OAI compatible messages, where inside chat_adapter.py we create a large list of content blocks by giving fields with an image_url special privileges:

{
    "content": [{
         "type": "text",
         "text": "...",
    },
    {
         "type": "image_url"
         "image_url": {"url": "..."} # url is either an actual url or the base64 data
    }]
}

I do some fairly naive parsing inside ChatAdapter, and there is definitely a more elegant solution here. #1763 addresses the List case, but I want a more generalized solution.

cc @okhat

Hey, I was trying to perform VQA with an LLM using dspy for optimized prompting and I'm not able to pass the base64image to LLM via dspy. Could you let me know how you were able to do it? I tried dspy.Image but I get an error saying No module called dspy.Image. Thanks

@okhat
Copy link
Collaborator

okhat commented Nov 12, 2024

@rzr2kor Are you on the latest version of DSPy? pip install -U dspy

@isaacbmiller
Copy link
Collaborator Author

isaacbmiller commented Nov 12, 2024

Then at the end of the message I send the list of (ID, img) pairs.

@thomasahle Did you find that this worked better than interweaving the {"type": "image_url", "image_url": ...}) into your actual text content, or just a design decision

@glesperance
Copy link
Contributor

glesperance commented Nov 13, 2024

With images complex types it seems like we could unlock MiproV2 w fewshots aware enabled as DescribeProgram / DescribeModule could then be modified to receive program_example that contains images.

@thomasahle
Copy link
Collaborator

Then at the end of the message I send the list of (ID, img) pairs.

@thomasahle Did you find that this worked better than interweaving the {"type": "image_url", "image_url": ...}) into your actual text content, or just a design decision

I couldn't put it in "the actual context", since that was just one big json string

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants