How can I handle LLM errors? #1282

rmonvfer · 2024-12-23T13:40:20Z

Question

Is it currently possible to provide a custom error handler for LLM requests?

Context

I'm building a voice agent (VoicePipelineAgent) using the latest version of both livekit-agents and livekit-plugins-azure and Azure as the only provider for all stages of the pipeline (LLM, TTS and STT).

For reference:

assistant = VoicePipelineAgent(
    vad=ctx.proc.userdata["vad"],
    stt=stt.STT(
        speech_key=settings.AZURE_SPEECH_KEY,
        speech_region=settings.AZURE_SPEECH_REGION,
        language="es-ES",
        languages=["es-ES"]
    ),
    llm=openai.LLM.with_azure(
        model="gpt-4o",
        azure_endpoint=settings.AZURE_OPENAI_ENDPOINT,
        azure_deployment=settings.AZURE_OPENAI_DEPLOYMENT,
        api_version=settings.AZURE_OPENAI_API_VERSION,
        api_key=settings.AZURE_OPENAI_API_KEY,
        temperature=settings.LLM_TEMPERATURE,
    ),
    tts=azure.TTS(
        speech_region=settings.AZURE_SPEECH_REGION,
        voice="es-ES-LiaNeural",
        language="es-ES"
    ),
    min_endpointing_delay=0.7,
    interrupt_speech_duration=1.2,
    max_nested_fnc_calls=3,
    preemptive_synthesis=True,
    chat_ctx=initial_ctx,
    fnc_ctx=AssistantToolContext(ctx, participant)
)

Problem

We rely on the Azure OpenAI Service so all inputs and outputs (to and from the LLM) are heavily moderated and the chance of some random sentence being flagged as inappropriate is always non-zero even if we set the moderation to a minimum. We suspect the root cause is the poor performance of OpenAI's moderation classifiers in non-english languages, flagging some messages as "sexual" or "self-harm" in a pretty standard conversation.

In practice, requesting a completion with an "inappropriate" (again, not inappropriate by any means but flagged as such) word/sentence on it results in a 400 status code, which gets handled by livekit-agents on this line in the LLMStream class.

For reference, this is how it looks like in the logs

Traceback (most recent call last):
  File ".venv/lib/python3.12/site-packages/livekit/agents/llm/llm.py", line 149, in _main_task
    return await self._run()
           ^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/livekit/plugins/openai/llm.py", line 767, in _run
    raise APIStatusError(
livekit.agents._exceptions.APIStatusError: Error code: 400 - {'error': {'message': "The response was filtered due to the prompt triggering Azure OpenAI's content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: https://go.microsoft.com/fwlink/?linkid=2198766", 'type': None, 'param': 'prompt', 'code': 'content_filter', 'status': 400, 'innererror': {'code': 'ResponsibleAIPolicyViolation', 'content_filter_result': {'hate': {'filtered': False, 'severity': 'safe'}, 'jailbreak': {'filtered': False, 'detected': False}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': True, 'severity': 'medium'}, 'violence': {'filtered': False, 'severity': 'safe'}}}}}
failed to generate LLM completion, retrying in 5.0s

This causes a chain of errors (because the same message is being sent over and over again) which ultimately prevent the agent from producing any output and a broken conversation state (because the user expects a response that never arrives and the agent just waits for a new human message because its turn is done).

After some research, it looks like the only error handling is done on the _main_task method of the LLMStream class and it's just a simple retry loop with a delay between attempts.

Proposal

If this isn't already implemented, I would like to propose an implementation for a new error handling mechanism.

Ideally, I would be able to provide a custom handler that gets called whenever the API returns an error. This would be provided as a parameter to the VoicePipelineAgent in the same way other callbacks are provided.

VoicePipelineAgent(
    on_llm_error=...,
    ...
)

The provided callback would likely receive the LLMStream implementation (for example, the OpenAI one when using OpenAI-compatible services) although I'm not sure if this is the most intuitive interface or how this would work in practice. In any case, the proposed on_llm_error callback must be able to determine if the agent makes the same completion request again, changes the message contents or returns a default message (the same way it's currently being done with tools, where a default message can be "said" using the .say() method).

If no on_llm_error callback is provided, the current _main_task can be used instead, making this an opt-in feature

What are your thoughts about this? I can work on this myself if it ends up being useful and once (if) we agree on the best approach

The text was updated successfully, but these errors were encountered:

davidzhao · 2024-12-24T19:33:52Z

thanks for the detailed post. this makes sense to be able to handle errors like this.

from your handler, what would you like to do with the error? would it have a similar interface to before_llm_cb so that we could initiate an alternative completion?

another related item: we have a FallbackAdapter that is designed to handle LLM-level errors (to be able to use a different provider). but for these type of content errors (perhaps identifiable with 400), it probably should not retry with another provider.

rmonvfer · 2024-12-26T14:17:57Z

Thank you for your quick response! I think the handler should receive the raw APIError (as it's key in deciding what to do next) in addition to the LLM (instead of the VoicePipelineAgent directly) and the ChatContext (somewhat similar to before_llm_cb)

As to what happens inside the callback itself, I think it should always generate an alternative completion. The way this completion is generated heavily depends on the specific error received but as a rule of thumb:

Content errors require the last message (in practice, the ChatContext) to be modified in some way to avoid hitting the same "moderation" rule again.
Connection and timeout errors likely require a temporary switch to another provider and a full blown error handler might not make much sense (just to be consistent). If we end up deciding it does make sense, we should provide the FallbackAdapter to the callback handler instead of the LLM used in the agent, as this would allow generating an alternative completion straight away.

Building on my previous example (the Azure OpenAI moderation error), we could take the _default_before_llm_cb as the starting point and make some changes to get a simple example:

def on_llm_error(
    error: APIError,
    llm: LLM, 
    chat_ctx: ChatContext
) -> LLMStream:
    # Handle the error depending on its type and content
    if typeof(error) is APIStatusError and error.status_code == 400:
        # To handle a moderation error like this, remove the last message in the chat
        # and insert a new "system" (or "developer" as per OpenAI's new naming) to let the 
        # LLM know about this and request a new message to the user.
        # This will trigger a new completion with the (now likely valid) user message.
        chat_ctx.messages = chat_ctx.messages[:-1]
        chat_ctx.append(
            text="The user has said something but you could not hear it, ask again.",
            role="system"
        )
    else:
        # Handle other cases too
        pass

    # Finally, request a new completion with the modified chat context
    return llm.chat(
        chat_ctx=chat_ctx,
        fnc_ctx=agent.fnc_ctx,
    )

Something like this would tick all the boxes for me as it's pretty simple, follows the existing conventions (we can define a _default_on_llm_error the same way there is a _default_before_llm_cb) and it's very flexible (as shown in the example above)

Implementation-wise, it looks like completion errors are handled in the _main_task on the base LLMStream class. Because the chat() method of LLM class returns an LLMStream, I would add the on_llm_error as a new constructor parameter to the LLM (which would pass it along to the LLMStream when creating one, see an example here), the VoicePipelineAgent (this is just for consistency, behind the scenes, we would simply pass it to the constructor-provided LLM) and finally, the LLMStream itself, that would call it in the _main_task method:

async def _main_task(self) -> None:
    for i in range(self._conn_options.max_retry + 1):
        try:
            return await self._run()
        except APIError as api_error:
            if self._on_llm_error is not None:
                return self._on_llm_error(api_error, self._llm, self._chat_ctx)
            else:
                if self._conn_options.max_retry == 0:
                    raise
                elif i == self._conn_options.max_retry:
                    raise APIConnectionError(
                        f"failed to generate LLM completion after {self._conn_options.max_retry + 1} attempts",
                    ) from api_error
                else:
                    logger.warning(
                        f"failed to generate LLM completion, retrying in {self._conn_options.retry_interval}s",
                        exc_info=api_error,
                        extra={
                            "llm": self._llm._label,
                            "attempt": i + 1,
                        },
                    )
                    
            await asyncio.sleep(self._conn_options.retry_interval)

Note that this is just from the top of my head and it might be a little buggy but the core idea remains (keep the previous behavior whilst adding the new callback).

Please let me know your thoughts!

rmonvfer added the question Further information is requested label Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I handle LLM errors? #1282

How can I handle LLM errors? #1282

rmonvfer commented Dec 23, 2024

davidzhao commented Dec 24, 2024

rmonvfer commented Dec 26, 2024

How can I handle LLM errors? #1282

How can I handle LLM errors? #1282

Comments

rmonvfer commented Dec 23, 2024

Question

Context

Problem

Proposal

davidzhao commented Dec 24, 2024

rmonvfer commented Dec 26, 2024