DO NOT MERGE, wip audio evals #5616

21ShisodeParth · 2024-12-05T02:08:24Z

This PR aims to provide the audio_classify function. The use case is similar to llm_classify, but geared towards generating classifications on audio data.

Big things left to be taken care of:

Testing - not extensive, since audio_classify is a wrapper around llm_classify which is already heavily tested)
What do we do if the user wants to pass in audio bytes or local paths to their audio files?
Thinking we have an optional param for audio_format which can either be a str, Series, or list(if they have different audio formats in the same dataset). Otherwise, if they just have an audio_url to AWS or GCloud, the file format info is typically present within the metadata.

packages/phoenix-evals/src/phoenix/evals/audio.py

packages/phoenix-evals/src/phoenix/evals/models/openai.py

anticorrelator · 2024-12-16T18:11:05Z

packages/phoenix-evals/src/phoenix/evals/models/openai.py

        if system_instruction:
            messages.insert(0, {"role": "system", "content": str(system_instruction)})
        return messages

    def verbose_generation_info(self) -> str:
        return f"OpenAI invocation parameters: {self.public_invocation_params}"

-    async def _async_generate(self, prompt: Union[str, MultimodalPrompt], **kwargs: Any) -> str:
+    async def _async_generate(self, prompt: Union[str, MultimodalPrompt], data_fetcher: Optional[Callable[[str], Audio]] = None, **kwargs: Any) -> str:


it's probably not a good idea not to change the interface for _async_generate as it's used widely across Phoenix—a kwarg shouldn't drastically change things though, so let me think about this.

anticorrelator · 2024-12-16T18:12:04Z

packages/phoenix-evals/src/phoenix/evals/models/openai.py

@@ -279,27 +280,43 @@ def _init_rate_limiter(self) -> None:
        )

    def _build_messages(
-        self, prompt: MultimodalPrompt, system_instruction: Optional[str] = None
+        self, prompt: MultimodalPrompt, data_fetcher: Optional[Callable[[str], Audio]] = None, system_instruction: Optional[str] = None


Also note that the type annotation on the callable does not match the docstring, it's best if users return a familiar type instead of an internal one I think

anticorrelator · 2024-12-16T18:13:02Z

packages/phoenix-evals/src/phoenix/evals/models/openai.py

+            if part.content_type == PromptPartContentType.TEXT:
+                messages.append({"role": "system", "content": part.content})
+            elif part.content_type == PromptPartContentType.AUDIO:
+                audio_object = data_fetcher(part.content)


I don't super like the fact that we've plumbed an arbitrary callable down to this level, maybe I'm overthinking it

anticorrelator · 2024-12-17T20:44:29Z

packages/phoenix-evals/src/phoenix/evals/classify.py

+            as the total runtime of each classification (in seconds).
+    """
+    if not isinstance(dataframe, pd.DataFrame):
+        dataframe = pd.DataFrame(dataframe, columns=["audio_url"])


I don't think we should be hard-coding the column name like this maybe? If users are passing in an iterable of urls, we should map the column names to the template variable names

anticorrelator · 2024-12-17T20:45:19Z

packages/phoenix-evals/src/phoenix/evals/classify.py

+    model: BaseModel,
+    template: Union[ClassificationTemplate, PromptTemplate, str],
+    rails: List[str],
+    data_fetcher: Optional[Callable[[str], Audio]],


I still don't think we should be requiring the user return our internal type, it feels like a structure they need to learn / import from that feels clunky

anticorrelator · 2024-12-17T20:46:15Z

packages/phoenix-evals/src/phoenix/evals/models/openai.py

+                        "content": [
+                            {
+                                "type": "input_audio",
+                                "input_audio": {"data": part.content, "format": audio_format.value},


maybe we should be trying to infer the format from the file headers instead—we'll just do our best effort and fall back to something sensible if the inference doesn't work. We have the bytestring so the headers should be there

anticorrelator · 2024-12-17T20:47:55Z

packages/phoenix-evals/src/phoenix/evals/utils.py

+
+
+@dataclass
+class Audio:


this feels pretty hard for a user to really know they have to wrap the output in this wrapper class

…t part type, since we're adding a data processor, and remove some test cases to implement

21ShisodeParth added 2 commits November 22, 2024 13:49

change 'rails' to 'expected_eval_labels'

38ec52e

wip

0edc57e

anticorrelator reviewed Dec 5, 2024

View reviewed changes

packages/phoenix-evals/src/phoenix/evals/audio.py Outdated Show resolved Hide resolved

anticorrelator reviewed Dec 5, 2024

View reviewed changes

packages/phoenix-evals/src/phoenix/evals/models/openai.py Outdated Show resolved Hide resolved

21ShisodeParth added 3 commits December 5, 2024 17:05

wip

822b918

wip

c99c52d

revert 'classify.py'

1fdee44

anticorrelator reviewed Dec 6, 2024

View reviewed changes

packages/phoenix-evals/src/phoenix/evals/models/openai.py Outdated Show resolved Hide resolved

anticorrelator reviewed Dec 6, 2024

View reviewed changes

packages/phoenix-evals/src/phoenix/evals/models/openai.py Outdated Show resolved Hide resolved

include MultimodalPrompt

3ebe948

anticorrelator added the DO NOT MERGE label Dec 9, 2024

21ShisodeParth added 5 commits December 9, 2024 12:35

further changes to audio.py

2552190

getting smth now

4efc6b8

moving audio_classify() to classify.py

cd548b3

remove print

490c090

move data fetching to where we build the messages

b4fc91d

anticorrelator reviewed Dec 16, 2024

View reviewed changes

21ShisodeParth added 3 commits December 16, 2024 12:08

scrapped and redone - using llm_classify within audio_classify

44deeb8

rename to proper

e609a36

cleanup

58e083e

anticorrelator reviewed Dec 17, 2024

View reviewed changes

21ShisodeParth added 4 commits December 18, 2024 16:34

merge llm_classify and audio_classify

19ba695

change back to llm_classify

829f664

redo and test file format inference

2ca4464

make data_processor apply to each data element, add a TEXT_DATA promp…

c4aacfd

…t part type, since we're adding a data processor, and remove some test cases to implement

ruff + clean

1019d86

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DO NOT MERGE, wip audio evals #5616

DO NOT MERGE, wip audio evals #5616

21ShisodeParth commented Dec 5, 2024 •

edited

Loading

anticorrelator Dec 16, 2024

anticorrelator Dec 16, 2024

anticorrelator Dec 16, 2024

anticorrelator Dec 17, 2024

anticorrelator Dec 17, 2024

anticorrelator Dec 17, 2024

anticorrelator Dec 17, 2024

DO NOT MERGE, wip audio evals #5616

Are you sure you want to change the base?

DO NOT MERGE, wip audio evals #5616

Conversation

21ShisodeParth commented Dec 5, 2024 • edited Loading

anticorrelator Dec 16, 2024

Choose a reason for hiding this comment

anticorrelator Dec 16, 2024

Choose a reason for hiding this comment

anticorrelator Dec 16, 2024

Choose a reason for hiding this comment

anticorrelator Dec 17, 2024

Choose a reason for hiding this comment

anticorrelator Dec 17, 2024

Choose a reason for hiding this comment

anticorrelator Dec 17, 2024

Choose a reason for hiding this comment

anticorrelator Dec 17, 2024

Choose a reason for hiding this comment

21ShisodeParth commented Dec 5, 2024 •

edited

Loading