-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can I handle LLM errors? #1282
Comments
thanks for the detailed post. this makes sense to be able to handle errors like this. from your handler, what would you like to do with the error? would it have a similar interface to another related item: we have a FallbackAdapter that is designed to handle LLM-level errors (to be able to use a different provider). but for these type of content errors (perhaps identifiable with 400), it probably should not retry with another provider. |
Thank you for your quick response! I think the handler should receive the raw As to what happens inside the callback itself, I think it should always generate an alternative completion. The way this completion is generated heavily depends on the specific error received but as a rule of thumb:
Building on my previous example (the Azure OpenAI moderation error), we could take the def on_llm_error(
error: APIError,
llm: LLM,
chat_ctx: ChatContext
) -> LLMStream:
# Handle the error depending on its type and content
if typeof(error) is APIStatusError and error.status_code == 400:
# To handle a moderation error like this, remove the last message in the chat
# and insert a new "system" (or "developer" as per OpenAI's new naming) to let the
# LLM know about this and request a new message to the user.
# This will trigger a new completion with the (now likely valid) user message.
chat_ctx.messages = chat_ctx.messages[:-1]
chat_ctx.append(
text="The user has said something but you could not hear it, ask again.",
role="system"
)
else:
# Handle other cases too
pass
# Finally, request a new completion with the modified chat context
return llm.chat(
chat_ctx=chat_ctx,
fnc_ctx=agent.fnc_ctx,
) Something like this would tick all the boxes for me as it's pretty simple, follows the existing conventions (we can define a Implementation-wise, it looks like completion errors are handled in the async def _main_task(self) -> None:
for i in range(self._conn_options.max_retry + 1):
try:
return await self._run()
except APIError as api_error:
if self._on_llm_error is not None:
return self._on_llm_error(api_error, self._llm, self._chat_ctx)
else:
if self._conn_options.max_retry == 0:
raise
elif i == self._conn_options.max_retry:
raise APIConnectionError(
f"failed to generate LLM completion after {self._conn_options.max_retry + 1} attempts",
) from api_error
else:
logger.warning(
f"failed to generate LLM completion, retrying in {self._conn_options.retry_interval}s",
exc_info=api_error,
extra={
"llm": self._llm._label,
"attempt": i + 1,
},
)
await asyncio.sleep(self._conn_options.retry_interval) Note that this is just from the top of my head and it might be a little buggy but the core idea remains (keep the previous behavior whilst adding the new callback). Please let me know your thoughts! |
Question
Is it currently possible to provide a custom error handler for LLM requests?
Context
I'm building a voice agent (
VoicePipelineAgent
) using the latest version of bothlivekit-agents
andlivekit-plugins-azure
and Azure as the only provider for all stages of the pipeline (LLM, TTS and STT).For reference:
Problem
We rely on the Azure OpenAI Service so all inputs and outputs (to and from the LLM) are heavily moderated and the chance of some random sentence being flagged as inappropriate is always non-zero even if we set the moderation to a minimum. We suspect the root cause is the poor performance of OpenAI's moderation classifiers in non-english languages, flagging some messages as "sexual" or "self-harm" in a pretty standard conversation.
In practice, requesting a completion with an "inappropriate" (again, not inappropriate by any means but flagged as such) word/sentence on it results in a 400 status code, which gets handled by
livekit-agents
on this line in theLLMStream
class.For reference, this is how it looks like in the logs
This causes a chain of errors (because the same message is being sent over and over again) which ultimately prevent the agent from producing any output and a broken conversation state (because the user expects a response that never arrives and the agent just waits for a new human message because its turn is done).
After some research, it looks like the only error handling is done on the
_main_task
method of theLLMStream
class and it's just a simple retry loop with a delay between attempts.Proposal
If this isn't already implemented, I would like to propose an implementation for a new error handling mechanism.
Ideally, I would be able to provide a custom handler that gets called whenever the API returns an error. This would be provided as a parameter to the
VoicePipelineAgent
in the same way other callbacks are provided.The provided callback would likely receive the
LLMStream
implementation (for example, the OpenAI one when using OpenAI-compatible services) although I'm not sure if this is the most intuitive interface or how this would work in practice. In any case, the proposedon_llm_error
callback must be able to determine if the agent makes the same completion request again, changes the message contents or returns a default message (the same way it's currently being done with tools, where a default message can be "said" using the.say()
method).If no
on_llm_error
callback is provided, the current_main_task
can be used instead, making this an opt-in featureWhat are your thoughts about this? I can work on this myself if it ends up being useful and once (if) we agree on the best approach
The text was updated successfully, but these errors were encountered: