You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The thinking model API is admittedly a bit of a mess. In sync mode, it returns a response candidate with 2 parts blocks, first one for thought, second one for the actual response:
Count from 1 to 10
{
"candidates": [
{
"content": {
"parts": [
{
"text": "My thinking process for responding to \"Count from 1 to 10\" is straightforward:\n\n1. **Identify the core request:** The user wants a numerical sequence starting at 1 and ending at 10.\n\n2. **Recall basic counting:** I have access to fundamental knowledge of numbers and their order.\n\n3. **Generate the sequence:** I produce the numbers in the specified order: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.\n\n4. **Determine the appropriate formatting:** A simple list separated by commas is the most natural and readable way to present the count.\n\n5. **Construct the response:** I combine the generated sequence into a coherent sentence."
},
{
"text": "1, 2, 3, 4, 5, 6, 7, 8, 9, 10\n"
}
],
"role": "model"
},
"finishReason": "STOP",
[...]
}
],
[...]
}
In streaming mode:
Generates candidates with 1 part while thinking.
Generates 1 candidate with 2 parts, one for the end of the thinking, one for the beginning of the response. I think it's always 2 parts.
Generates responses with 1 part for the rest of the response:
Count from 1 to 100:
[... Previous thinking with 1 part...]
Parts from response:
[
{
text: " on a new line.\n\nEssentially, for such a basic request, there isn't much complex processing involved. It's direct application of the definition of counting. More complex requests would involve deeper analysis of constraints, potential edge cases, and more sophisticated algorithms. But for this, it's a simple retrieval",
}
]
Parts from response:
[
{
text: " and presentation of a known sequence.",
}, {
text: "0\n1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n12\n13\n14\n15\n16\n17\n18\n19\n20\n21\n",
}
]
Parts from response:
[
{
text: "22\n23\n24\n25\n26\n27\n28\n29\n30\n31\n32\n33\n34\n35\n36\n37\n38\n39\n40\n41\n42\n4",
}
]
[... the rest of the response ...]
This is of course a terrible API, and I hope they improve it in the future, but in the mean time, would it be possible to handle this?
Since this is just 1 model for now, I thin it's best to not implement the logic in the SDK, but expose enough data for the user to implement it?
At the moment, onChunk (of streamText) receives a text-delta event with all parts of the response candidate concatenated together, so it's impossible to tell where the thought ends, and the response begins.
Maybe it's possible to add the original candidate to the event that is passed to onChunk? Same for non-streaming generation, I don't use it, but I'm not sure we get access to some form of the response that contains the candidate parts as an array, only the concatenated text from both parts.
Use Cases
https://chat.congusto.ai , loved by some AI researchers, and thus requiring all the cool new features.
Additional context
No response
The text was updated successfully, but these errors were encountered:
Feature Description
The thinking model API is admittedly a bit of a mess. In sync mode, it returns a response candidate with 2 parts blocks, first one for thought, second one for the actual response:
In streaming mode:
This is of course a terrible API, and I hope they improve it in the future, but in the mean time, would it be possible to handle this?
Since this is just 1 model for now, I thin it's best to not implement the logic in the SDK, but expose enough data for the user to implement it?
At the moment, onChunk (of streamText) receives a text-delta event with all parts of the response candidate concatenated together, so it's impossible to tell where the thought ends, and the response begins.
Maybe it's possible to add the original candidate to the event that is passed to onChunk? Same for non-streaming generation, I don't use it, but I'm not sure we get access to some form of the response that contains the candidate parts as an array, only the concatenated text from both parts.
Use Cases
https://chat.congusto.ai , loved by some AI researchers, and thus requiring all the cool new features.
Additional context
No response
The text was updated successfully, but these errors were encountered: