diff --git a/guides/05_chatbots/03_agents-and-tool-usage.md b/guides/05_chatbots/03_agents-and-tool-usage.md index 66b2caee756a7..ea74017230c6f 100644 --- a/guides/05_chatbots/03_agents-and-tool-usage.md +++ b/guides/05_chatbots/03_agents-and-tool-usage.md @@ -3,14 +3,14 @@ Tags: LLM, AGENTS, CHAT Related spaces: https://huggingface.co/spaces/gradio/agent_chatbot, https://huggingface.co/spaces/gradio/langchain-agent -The Gradio Chatbot can natively display intermediate thoughts and tool usage. This makes it perfect for creating UIs for LLM agents. This guide will show you how. +The Gradio Chatbot can natively display intermediate thoughts and tool usage. This makes it perfect for creating UIs for LLM agents and chain-of-thought (CoT) demos. This guide will show you how. ## The metadata key In addition to the `content` and `role` keys, the messages dictionary accepts a `metadata` key. At present, the `metadata` key accepts a dictionary with a single key called `title`. If you specify a `title` for the message, it will be displayed in a collapsible box. -Here is an example, were we display the agent's thought to use a weather API tool to answer the user query. +Here is an example, where we display the agent's thought to use a weather API tool to answer the user query. ```python with gr.Blocks() as demo: @@ -24,7 +24,9 @@ with gr.Blocks() as demo: ![simple-metadat-chatbot](https://github.com/freddyaboulton/freddyboulton/assets/41651716/3941783f-6835-4e5e-89a6-03f850d9abde) -## A real example using transformers.agents +## Building with Agents + +### A real example using transformers.agents We'll create a Gradio application simple agent that has access to a text-to-image tool. @@ -87,12 +89,11 @@ You can see the full demo code [here](https://huggingface.co/spaces/gradio/agent ![transformers_agent_code](https://github.com/freddyaboulton/freddyboulton/assets/41651716/c8d21336-e0e6-4878-88ea-e6fcfef3552d) -## A real example using langchain agents +### A real example using langchain agents We'll create a UI for langchain agent that has access to a search engine. -We'll begin with imports and setting up the langchain agent. Note that you'll need an .env file with -the following environment variables set - +We'll begin with imports and setting up the langchain agent. Note that you'll need an .env file with the following environment variables set - ``` SERPAPI_API_KEY= @@ -166,4 +167,139 @@ demo.launch() That's it! See our finished langchain demo [here](https://huggingface.co/spaces/gradio/langchain-agent). +## Building with Visibly Thinking LLMs + + +The Gradio Chatbot can natively display intermediate thoughts of a _thinking_ LLM. This makes it perfect for creating UIs that show how an AI model "thinks" while generating responses. Below guide will show you how to build a chatbot that displays Gemini AI's thought process in real-time. + + +### A real example using Gemini 2.0 Flash Thinking API + +Let's create a complete chatbot that shows its thoughts and responses in real-time. We'll use Google's Gemini API for accessing Gemini 2.0 Flash Thinking LLM and Gradio for the UI. + +We'll begin with imports and setting up the gemini client. Note that you'll need to [acquire a Google Gemini API key](https://aistudio.google.com/apikey) first - + +```python +import gradio as gr +from gradio import ChatMessage +from typing import Iterator +import google.generativeai as genai + +genai.configure(api_key="your-gemini-api-key") +model = genai.GenerativeModel("gemini-2.0-flash-thinking-exp-1219") +``` + +First, let's set up our streaming function that handles the model's output: + +```python +def stream_gemini_response(user_message: str, messages: list) -> Iterator[list]: + """ + Streams both thoughts and responses from the Gemini model. + """ + # Initialize response from Gemini + response = model.generate_content(user_message, stream=True) + + # Initialize buffers + thought_buffer = "" + response_buffer = "" + thinking_complete = False + + # Add initial thinking message + messages.append( + ChatMessage( + role="assistant", + content="", + metadata={"title": "⏳Thinking: *The thoughts produced by the Gemini2.0 Flash model are experimental"} + ) + ) + + for chunk in response: + parts = chunk.candidates[0].content.parts + current_chunk = parts[0].text + + if len(parts) == 2 and not thinking_complete: + # Complete thought and start response + thought_buffer += current_chunk + messages[-1] = ChatMessage( + role="assistant", + content=thought_buffer, + metadata={"title": "⏳Thinking: *The thoughts produced by the Gemini2.0 Flash model are experimental"} + ) + + # Add response message + messages.append( + ChatMessage( + role="assistant", + content=parts[1].text + ) + ) + thinking_complete = True + + elif thinking_complete: + # Continue streaming response + response_buffer += current_chunk + messages[-1] = ChatMessage( + role="assistant", + content=response_buffer + ) + + else: + # Continue streaming thoughts + thought_buffer += current_chunk + messages[-1] = ChatMessage( + role="assistant", + content=thought_buffer, + metadata={"title": "⏳Thinking: *The thoughts produced by the Gemini2.0 Flash model are experimental"} + ) + + yield messages +``` + +Then, let's create the Gradio interface: + +```python +with gr.Blocks() as demo: + gr.Markdown("# Chat with Gemini 2.0 Flash and See its Thoughts 💭") + + chatbot = gr.Chatbot( + type="messages", + label="Gemini2.0 'Thinking' Chatbot", + render_markdown=True, + ) + + input_box = gr.Textbox( + lines=1, + label="Chat Message", + placeholder="Type your message here and press Enter..." + ) + + # Set up event handlers + msg_store = gr.State("") # Store for preserving user message + + input_box.submit( + lambda msg: (msg, msg, ""), # Store message and clear input + inputs=[input_box], + outputs=[msg_store, input_box, input_box], + queue=False + ).then( + user_message, # Add user message to chat + inputs=[msg_store, chatbot], + outputs=[input_box, chatbot], + queue=False + ).then( + stream_gemini_response, # Generate and stream response + inputs=[msg_store, chatbot], + outputs=chatbot + ) + +demo.launch() +``` + +This creates a chatbot that: + +- Displays the model's thoughts in a collapsible section +- Streams the thoughts and final response in real-time +- Maintains a clean chat history + + That's it! You now have a chatbot that not only responds to users but also shows its thinking process, creating a more transparent and engaging interaction. See our finished Gemini 2.0 Flash Thinking demo [here](https://huggingface.co/spaces/ysharma/Gemini2-Flash-Thinking).