Reset a failed tool call #5666

enyst · 2024-12-18T13:30:43Z

End-user friendly description of the problem this fixes or functionality that this introduces

Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions

This PR proposes a quick fix to:

add an ErrorObs for a tool call pending when the agent is STOPPED or in ERROR. Otherwise these actions will not be in agent's history sent to the model
fix log tool calls in responses

Link of any specific issues this addresses

To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:e8bd498-nikolaik   --name openhands-app-e8bd498   docker.all-hands.dev/all-hands-ai/openhands:e8bd498

xingyaoww · 2024-12-18T18:17:40Z

openhands/controller/agent_controller.py

@@ -334,6 +334,10 @@ async def _handle_message_action(self, action: MessageAction) -> None:
    def _reset(self) -> None:
        """Resets the agent controller"""

+        # if the pending action has a tool call, reset the tool call
+        if self._pending_action and hasattr(self._pending_action, 'tool_call_metadata'):
+            self._pending_action.tool_call_metadata = None


Any idea why we want to do this? Why not directly remove the pending_action?

The failed tool calls are gathered in the pending queue in the agent otherwise, and never get sent to the LLM. They are still in the stream, but the agent never sees them, even though they appear real to the user (and to us, when sent in trajectories/logs etc).

Actually, I think we should give it an error observation. It appears to work, I'm still testing that. What do you think?

What does "failed tool calls" means? Is it a tool call that somehow never get executed?

Ah, yes, sorry. When there's a runtime error like one raised here, it's possible it never gets an observation. Like this:

[Bug]: timing out causes the agent to stop #5601

[Bug]: Runtime error loses agent history #5602

ahh - then how would remove the tool call metadata here help though?

Oh, I think you're right to question it - it should fail some asserts in the agent if it was removed. No, this is no good.

Here in reset, we still have the self._pending_action, and we know this action isn't actually going to finish normally. We can try to create on spot an error obs, linked to that action? 🤔

I'm not sure there are many alternatives right now, I can think of others but they won't work... Does the agent really need to support multiple pending tool calls, when in fact we're processing them by response (ModelResponse)? And we override the tool call to have one per ModelResponse. We're executing it, it doesn't actually wait for another?

We can try to create on spot an error obs, linked to that action? 🤔

I think this is probably a better way! This also preserve enough info so we can look into details when it happens again

openhands/agenthub/codeact_agent/codeact_agent.py

…nHands into enyst/message-separator

enyst · 2024-12-18T21:51:57Z

@xingyaoww This is ready for review. Not tested with UI yet, but headless it works fine and looks ok.

xingyaoww

Mostly LGTM! Just hoping to have a test -- maybe we can ask OpenHands to help here?

xingyaoww · 2024-12-20T14:59:25Z

openhands/controller/agent_controller.py

+
+            # make a new ErrorObservation with the tool call metadata
+            if not found_observation:
+                obs = ErrorObservation(content='The action has not been executed.')


Suggested change

obs = ErrorObservation(content='The action has not been executed.')

obs = ErrorObservation(content='The action has not been executed. Please try again.')

Should we? The agent also adds "[Error occurred in processing last action]"

OpenHands/openhands/agenthub/codeact_agent/codeact_agent.py

Line 305 in 0dd919b

text += '\n[Error occurred in processing last action]'

The agent is in STOPPED or in ERROR at this point. So with the UI, control is back to the user. What the agent should do might be different after the user message?

xingyaoww · 2024-12-20T15:00:36Z

openhands/controller/agent_controller.py

+            # make a new ErrorObservation with the tool call metadata
+            if not found_observation:
+                obs = ErrorObservation(content='The action has not been executed.')
+                obs.tool_call_metadata = self._pending_action.tool_call_metadata


Does that means this ErrorObservation will trigger the execution of the same action or no? Will be nice if we can add a test for this..

It won't, because

next lines will make self._pending_action = None in the controller, so it won't keep waiting for an obs,

it's an obs with tool_id, so in the agent too, the pending tool call is now "complete"

On the contrary, leaving it pending is problematic 🤔

openhands-agent added some nice unit tests! They show the behavior here

openhands/llm/llm.py

enyst · 2024-12-20T16:42:36Z

@openhands-agent-exp Read the diff of this PR. You do not need to fix anything, just add unit tests for this.

IMPORTANT: you need to be careful, and read about each of the variables used in the reset() function in agent_controller.py, to understand their purpose, how to define them and how they work! You can read some related unit tests too, like those testing agent controller.

openhands-agent · 2024-12-20T16:42:50Z

OpenHands started fixing the pr! You can monitor the progress here.

openhands-agent · 2024-12-20T16:52:40Z

Overview of Changes:
✓ Successfully implemented comprehensive unit tests for reset() function in agent_controller.py
✓ Added test coverage for all major scenarios and edge cases
✓ Properly implemented mocking and test infrastructure
✓ Verified core functionality and error handling

Status: All requested changes have been successfully completed with no remaining issues. The implementation provides thorough test coverage and addresses all key aspects of the reset() functionality.

…essage-separator

…nHands into enyst/message-separator

openhands/controller/agent_controller.py

enyst added 2 commits December 18, 2024 14:21

attempt to reset the tool call

6aeaa7b

add newline

6f7f654

enyst marked this pull request as draft December 18, 2024 14:57

enyst marked this pull request as ready for review December 18, 2024 17:48

xingyaoww reviewed Dec 18, 2024

View reviewed changes

enyst added 2 commits December 18, 2024 20:17

fix content

c20598a

add an error obs

f48f857

enyst marked this pull request as draft December 18, 2024 19:19

enyst added 4 commits December 18, 2024 20:55

Merge branch 'main' into enyst/message-separator

71a6a29

clean up the /u/a/u/a tweak

575a868

Merge branch 'enyst/message-separator' of github.com:All-Hands-AI/Ope…

f90c958

…nHands into enyst/message-separator

make sure there are no doubles

d60d3ed

enyst marked this pull request as ready for review December 18, 2024 21:12

enyst added 2 commits December 18, 2024 22:20

set the cause

fa0c50b

logging fix

6caf2c7

enyst requested a review from xingyaoww December 19, 2024 14:25

xingyaoww reviewed Dec 20, 2024

View reviewed changes

Fix pr #5666: Reset a failed tool call

708a858

enyst added 3 commits December 20, 2024 18:03

Merge branch 'main' of github.com:All-Hands-AI/OpenHands into enyst/m…

d7a992c

…essage-separator

revert, litellm aims to openai compatibility

cde7a06

Merge branch 'enyst/message-separator' of github.com:All-Hands-AI/Ope…

1da09ab

…nHands into enyst/message-separator

enyst added the lint-fix label Dec 20, 2024

🤖 Auto-fix Python linting issues

877c6fe

enyst commented Dec 20, 2024

View reviewed changes

openhands/controller/agent_controller.py Show resolved Hide resolved

Update openhands/controller/agent_controller.py

e8bd498

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reset a failed tool call #5666

Reset a failed tool call #5666

enyst commented Dec 18, 2024 •

edited by github-actions bot

Loading

xingyaoww Dec 18, 2024

enyst Dec 18, 2024

xingyaoww Dec 18, 2024

enyst Dec 18, 2024

xingyaoww Dec 18, 2024

enyst Dec 18, 2024

enyst Dec 18, 2024

xingyaoww Dec 18, 2024

enyst commented Dec 18, 2024

xingyaoww left a comment

xingyaoww Dec 20, 2024

enyst Dec 20, 2024

xingyaoww Dec 20, 2024

enyst Dec 20, 2024

enyst Dec 20, 2024

enyst commented Dec 20, 2024

openhands-agent commented Dec 20, 2024

openhands-agent commented Dec 20, 2024

	obs = ErrorObservation(content='The action has not been executed.')
	obs = ErrorObservation(content='The action has not been executed. Please try again.')

Reset a failed tool call #5666

Are you sure you want to change the base?

Reset a failed tool call #5666

Conversation

enyst commented Dec 18, 2024 • edited by github-actions bot Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enyst commented Dec 18, 2024

xingyaoww left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enyst commented Dec 20, 2024

openhands-agent commented Dec 20, 2024

openhands-agent commented Dec 20, 2024

enyst commented Dec 18, 2024 •

edited by github-actions bot

Loading