Imagine you have an image (demo.png
) and you want to generate Python code that processes this image and saves a new version of it (phi-3-vision.jpg
).
The code above automates this process by:
- Setting up the environment and necessary configurations.
- Creating a prompt that instructs the model to generate the required Python code.
- Sending the prompt to the model and collecting the generated code.
- Extracting and running the generated code.
- Displaying the original and processed images.
This approach leverages the power of AI to automate image processing tasks, making it easier and faster to achieve your goals.
Let's break down what the entire code does step by step:
-
Install Required Package:
!pip install langchain_nvidia_ai_endpoints -U
This command installs the
langchain_nvidia_ai_endpoints
package, ensuring it's the latest version. -
Import Necessary Modules:
from langchain_nvidia_ai_endpoints import ChatNVIDIA import getpass import os import base64
These imports bring in the necessary modules for interacting with the NVIDIA AI endpoints, handling passwords securely, interacting with the operating system, and encoding/decoding data in base64 format.
-
Set Up API Key:
if not os.getenv("NVIDIA_API_KEY"): os.environ["NVIDIA_API_KEY"] = getpass.getpass("Enter your NVIDIA API key: ")
This code checks if the
NVIDIA_API_KEY
environment variable is set. If not, it prompts the user to enter their API key securely. -
Define Model and Image Path:
model = 'microsoft/phi-3-vision-128k-instruct' chat = ChatNVIDIA(model=model) img_path = './imgs/demo.png'
This sets the model to be used, creates an instance of
ChatNVIDIA
with the specified model, and defines the path to the image file. -
Create Text Prompt:
text = "Please create Python code for image, and use plt to save the new picture under imgs/ and name it phi-3-vision.jpg."
This defines a text prompt instructing the model to generate Python code for processing an image.
-
Encode Image in Base64:
with open(img_path, "rb") as f: image_b64 = base64.b64encode(f.read()).decode() image = f'<img src="data:image/png;base64,{image_b64}" />'
This code reads the image file, encodes it in base64, and creates an HTML image tag with the encoded data.
-
Combine Text and Image into Prompt:
prompt = f"{text} {image}"
This combines the text prompt and the HTML image tag into a single string.
-
Generate Code Using ChatNVIDIA:
code = "" for chunk in chat.stream(prompt): print(chunk.content, end="") code += chunk.content
This code sends the prompt to the
ChatNVIDIA
model and collects the generated code in chunks, printing and appending each chunk to thecode
string. -
Extract Python Code from Generated Content:
begin = code.index('```python') + 9 code = code[begin:] end = code.index('```') code = code[:end]
This extracts the actual Python code from the generated content by removing the markdown formatting.
-
Run the Generated Code:
import subprocess result = subprocess.run(["python", "-c", code], capture_output=True)
This runs the extracted Python code as a subprocess and captures its output.
-
Display Images:
from IPython.display import Image, display display(Image(filename='./imgs/phi-3-vision.jpg')) display(Image(filename='./imgs/demo.png'))
These lines display the images using the
IPython.display
module.