-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reset a failed tool call #5666
base: main
Are you sure you want to change the base?
Reset a failed tool call #5666
Conversation
@@ -334,6 +334,10 @@ async def _handle_message_action(self, action: MessageAction) -> None: | |||
def _reset(self) -> None: | |||
"""Resets the agent controller""" | |||
|
|||
# if the pending action has a tool call, reset the tool call | |||
if self._pending_action and hasattr(self._pending_action, 'tool_call_metadata'): | |||
self._pending_action.tool_call_metadata = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea why we want to do this? Why not directly remove the pending_action?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The failed tool calls are gathered in the pending queue in the agent otherwise, and never get sent to the LLM. They are still in the stream, but the agent never sees them, even though they appear real to the user (and to us, when sent in trajectories/logs etc).
Actually, I think we should give it an error observation. It appears to work, I'm still testing that. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "failed tool calls" means? Is it a tool call that somehow never get executed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, sorry. When there's a runtime error like one raised here, it's possible it never gets an observation. Like this:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh - then how would remove the tool call metadata here help though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I think you're right to question it - it should fail some asserts in the agent if it was removed. No, this is no good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here in reset
, we still have the self._pending_action
, and we know this action isn't actually going to finish normally. We can try to create on spot an error obs, linked to that action? 🤔
I'm not sure there are many alternatives right now, I can think of others but they won't work... Does the agent really need to support multiple pending tool calls, when in fact we're processing them by response (ModelResponse)? And we override the tool call to have one per ModelResponse. We're executing it, it doesn't actually wait for another?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can try to create on spot an error obs, linked to that action? 🤔
I think this is probably a better way! This also preserve enough info so we can look into details when it happens again
@xingyaoww This is ready for review. Not tested with UI yet, but headless it works fine and looks ok. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM! Just hoping to have a test -- maybe we can ask OpenHands to help here?
|
||
# make a new ErrorObservation with the tool call metadata | ||
if not found_observation: | ||
obs = ErrorObservation(content='The action has not been executed.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obs = ErrorObservation(content='The action has not been executed.') | |
obs = ErrorObservation(content='The action has not been executed. Please try again.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we? The agent also adds "[Error occurred in processing last action]"
text += '\n[Error occurred in processing last action]' |
The agent is in STOPPED or in ERROR at this point. So with the UI, control is back to the user. What the agent should do might be different after the user message?
# make a new ErrorObservation with the tool call metadata | ||
if not found_observation: | ||
obs = ErrorObservation(content='The action has not been executed.') | ||
obs.tool_call_metadata = self._pending_action.tool_call_metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that means this ErrorObservation
will trigger the execution of the same action or no? Will be nice if we can add a test for this..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't, because
- next lines will make
self._pending_action = None
in the controller, so it won't keep waiting for an obs, - it's an obs with
tool_id
, so in the agent too, the pending tool call is now "complete"
On the contrary, leaving it pending is problematic 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
openhands-agent
added some nice unit tests! They show the behavior here
@openhands-agent-exp Read the diff of this PR. You do not need to fix anything, just add unit tests for this. IMPORTANT: you need to be careful, and read about each of the variables used in the reset() function in agent_controller.py, to understand their purpose, how to define them and how they work! You can read some related unit tests too, like those testing agent controller. |
Overview of Changes: Status: All requested changes have been successfully completed with no remaining issues. The implementation provides thorough test coverage and addresses all key aspects of the reset() functionality. |
…nHands into enyst/message-separator
End-user friendly description of the problem this fixes or functionality that this introduces
Give a summary of what the PR does, explaining any non-trivial design decisions
This PR proposes a quick fix to:
Link of any specific issues this addresses
To run this PR locally, use the following command: