Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reset a failed tool call #5666

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Reset a failed tool call #5666

wants to merge 16 commits into from

Conversation

enyst
Copy link
Collaborator

@enyst enyst commented Dec 18, 2024

End-user friendly description of the problem this fixes or functionality that this introduces

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions

This PR proposes a quick fix to:

  • add an ErrorObs for a tool call pending when the agent is STOPPED or in ERROR. Otherwise these actions will not be in agent's history sent to the model
  • fix log tool calls in responses

Link of any specific issues this addresses


To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:e8bd498-nikolaik   --name openhands-app-e8bd498   docker.all-hands.dev/all-hands-ai/openhands:e8bd498

@enyst enyst marked this pull request as draft December 18, 2024 14:57
@enyst enyst marked this pull request as ready for review December 18, 2024 17:48
@@ -334,6 +334,10 @@ async def _handle_message_action(self, action: MessageAction) -> None:
def _reset(self) -> None:
"""Resets the agent controller"""

# if the pending action has a tool call, reset the tool call
if self._pending_action and hasattr(self._pending_action, 'tool_call_metadata'):
self._pending_action.tool_call_metadata = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea why we want to do this? Why not directly remove the pending_action?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failed tool calls are gathered in the pending queue in the agent otherwise, and never get sent to the LLM. They are still in the stream, but the agent never sees them, even though they appear real to the user (and to us, when sent in trajectories/logs etc).

Actually, I think we should give it an error observation. It appears to work, I'm still testing that. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "failed tool calls" means? Is it a tool call that somehow never get executed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, sorry. When there's a runtime error like one raised here, it's possible it never gets an observation. Like this:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh - then how would remove the tool call metadata here help though?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think you're right to question it - it should fail some asserts in the agent if it was removed. No, this is no good.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here in reset, we still have the self._pending_action, and we know this action isn't actually going to finish normally. We can try to create on spot an error obs, linked to that action? 🤔

I'm not sure there are many alternatives right now, I can think of others but they won't work... Does the agent really need to support multiple pending tool calls, when in fact we're processing them by response (ModelResponse)? And we override the tool call to have one per ModelResponse. We're executing it, it doesn't actually wait for another?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can try to create on spot an error obs, linked to that action? 🤔

I think this is probably a better way! This also preserve enough info so we can look into details when it happens again

openhands/agenthub/codeact_agent/codeact_agent.py Outdated Show resolved Hide resolved
@enyst enyst marked this pull request as draft December 18, 2024 19:19
@enyst enyst marked this pull request as ready for review December 18, 2024 21:12
@enyst
Copy link
Collaborator Author

enyst commented Dec 18, 2024

@xingyaoww This is ready for review. Not tested with UI yet, but headless it works fine and looks ok.

@enyst enyst requested a review from xingyaoww December 19, 2024 14:25
Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM! Just hoping to have a test -- maybe we can ask OpenHands to help here?


# make a new ErrorObservation with the tool call metadata
if not found_observation:
obs = ErrorObservation(content='The action has not been executed.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
obs = ErrorObservation(content='The action has not been executed.')
obs = ErrorObservation(content='The action has not been executed. Please try again.')

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we? The agent also adds "[Error occurred in processing last action]"

text += '\n[Error occurred in processing last action]'

The agent is in STOPPED or in ERROR at this point. So with the UI, control is back to the user. What the agent should do might be different after the user message?

# make a new ErrorObservation with the tool call metadata
if not found_observation:
obs = ErrorObservation(content='The action has not been executed.')
obs.tool_call_metadata = self._pending_action.tool_call_metadata
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that means this ErrorObservation will trigger the execution of the same action or no? Will be nice if we can add a test for this..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't, because

  1. next lines will make self._pending_action = None in the controller, so it won't keep waiting for an obs,
  2. it's an obs with tool_id, so in the agent too, the pending tool call is now "complete"

On the contrary, leaving it pending is problematic 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openhands-agent added some nice unit tests! They show the behavior here

openhands/llm/llm.py Outdated Show resolved Hide resolved
@enyst
Copy link
Collaborator Author

enyst commented Dec 20, 2024

@openhands-agent-exp Read the diff of this PR. You do not need to fix anything, just add unit tests for this.

IMPORTANT: you need to be careful, and read about each of the variables used in the reset() function in agent_controller.py, to understand their purpose, how to define them and how they work! You can read some related unit tests too, like those testing agent controller.

@openhands-agent
Copy link
Contributor

OpenHands started fixing the pr! You can monitor the progress here.

@openhands-agent
Copy link
Contributor

Overview of Changes:
✓ Successfully implemented comprehensive unit tests for reset() function in agent_controller.py
✓ Added test coverage for all major scenarios and edge cases
✓ Properly implemented mocking and test infrastructure
✓ Verified core functionality and error handling

Status: All requested changes have been successfully completed with no remaining issues. The implementation provides thorough test coverage and addresses all key aspects of the reset() functionality.

@enyst enyst added the lint-fix label Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants