Skip to content

Commit

Permalink
Merge pull request #3796 from Arize-ai/docs
Browse files Browse the repository at this point in the history
chore: sync docs
  • Loading branch information
mikeldking authored Jul 2, 2024
2 parents 521ebde + 51c5ea1 commit e5af8e4
Show file tree
Hide file tree
Showing 7 changed files with 48 additions and 24 deletions.
6 changes: 1 addition & 5 deletions docs/datasets-and-experiments/how-to-datasets/README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,9 @@
# How-to: Datasets

{% hint style="info" %}
Datasets is still in pre-release
{% endhint %}

## How to create datasets

* [Create datasets from Pandas](creating-datasets.md#create-datasets-from-pandas)
* Create datasets from spans 
* [Create datasets from spans ](creating-datasets.md#from-spans)
* [Create datasets using synthetic data](creating-datasets.md#syntetic-data)

## Exporting datasets
Expand Down
42 changes: 35 additions & 7 deletions docs/datasets-and-experiments/how-to-datasets/creating-datasets.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,8 @@
# Creating Datasets

{% hint style="info" %}
Datasets is currently in pre-release
{% endhint %}

## From CSV

When manually creating a dataset (let's say collecting hypothetical questions and answers), the easiest way to start is by using a spreadsheet. Once you've collected the information, you can simply upload the CSV of your data to the Phoenix platform using the UI.
When manually creating a dataset (let's say collecting hypothetical questions and answers), the easiest way to start is by using a spreadsheet. Once you've collected the information, you can simply upload the CSV of your data to the Phoenix platform using the UI. You can also programmatically upload tabular data using Pandas as [seen below.](creating-datasets.md#from-pandas)

## From Pandas

Expand Down Expand Up @@ -42,9 +38,26 @@ dataset = client.upload_dataset(
{% endtab %}
{% endtabs %}

## Syntetic Data
## From Objects

One of the quicket way of getting started is to produce synthetic queries using an LLM.
Sometimes you just want to upload datasets using plain objects as CSVs and DataFrames can be too restrictive about the keys. 

{% tabs %}
{% tab title="Python" %}
```python

ds = px.Client().upload_dataset(
dataset_name="my-synthetic-dataset",
inputs=[{ "question": "hello" }, { "question": "good morning" }],
outputs=[{ "answer": "hi" }, { "answer": "good morning" }],
);
```
{% endtab %}
{% endtabs %}

## Synthetic Data

One of the quicket way of getting started is to produce synthetic queries using an LLM. 

{% tabs %}
{% tab title="Python" %}
Expand Down Expand Up @@ -126,3 +139,18 @@ client.upload_dataset(
```
{% endtab %}
{% endtabs %}



## From Spans

If you have an application that is traced using instrumentation, you can quickly add any span or group of spans using the Phoenix UI.

To add a single span to a dataset, simply select the span in the trace details view. You should see an add to dataset button on the top right. From there you can select the dataset you would like to add it to and make any changes you might need to make before saving the example.

<figure><img src="https://storage.googleapis.com/arize-assets/phoenix/assets/images/add_span_to_dataset.png" alt=""><figcaption><p>Add a specific span as a golden dataset or an example for further testing</p></figcaption></figure>

\
You can also use the filters on the spans table and select multiple spans to add to a specific dataset.

<figure><img src="https://storage.googleapis.com/arize-assets/phoenix/assets/images/add_llm_spans_for_ft.png" alt=""><figcaption><p>Add LLM spans for fine tuning to a dataset</p></figcaption></figure>
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@ OpenAIInstrumentor().instrument()
Running an experiment is as easy as calling `run_experiment` with the components we defined above. The results of the experiment will be show up in Phoenix:

```python
from phoenix.datasets.experiments import run_experiment
from phoenix.experiments import run_experiment

run_experiment(ds, task=task, evaluators=[no_error, has_results])
```
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Datasets and Experiments are currently in pre-release
We provide LLM evaluators out of the box. These evaluators are vendor agnostic and can be instantiated with a Phoenix model wrapper:

```python
from phoenix.datasets.evaluators import HelpfulnessEvaluator
from phoenix.experiments.evaluators import HelpfulnessEvaluator
from phoenix.evals.models import OpenAIModel

helpfulness_evaluator = HelpfulnessEvaluator(model=OpenAIModel())
Expand All @@ -21,7 +21,7 @@ helpfulness_evaluator = HelpfulnessEvaluator(model=OpenAIModel())

Code evaluators are functions that evaluate the output of your LLM task that don't use another LLM as a judge. An example might be checking for whether or not a given output contains a link - which can be implemented as a RegEx match.

`phoenix.datasets.evaluators` contains some pre-built code evaluators that can be passed to the `evaluators` parameter in experiments.
`phoenix.experiments.evaluators` contains some pre-built code evaluators that can be passed to the `evaluators` parameter in experiments.

{% tabs %}
{% tab title="Python" %}
Expand Down Expand Up @@ -86,7 +86,7 @@ For even more customization, use the `create_evaluator` decorator to further cus
{% tabs %}
{% tab title="Python" %}
```python
from phoenix.datasets.evaluators.utils import create_evaluator
from phoenix.experiments.evaluators import create_evaluator

# the decorator can be used to set display properties
# `name` corresponds to the metric name shown in the UI
Expand Down
10 changes: 5 additions & 5 deletions docs/datasets-and-experiments/quickstart-datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,15 +66,15 @@ def task(example: Example) -> str:
Use pre-built evaluators to grade task output with code...

```python
from phoenix.datasets.evaluators import ContainsAnyKeyword
from phoenix.experiments.evaluators import ContainsAnyKeyword

contains_keyword = ContainsAnyKeyword(keywords=["Y Combinator", "YC"])
```

or LLMs.

```python
from phoenix.datasets.evaluators import ConcisenessEvaluator
from phoenix.experiments.evaluators import ConcisenessEvaluator
from phoenix.evals.models import OpenAIModel

model = OpenAIModel(model="gpt-4o")
Expand All @@ -99,7 +99,7 @@ def jaccard_similarity(output: str, expected: Dict[str, Any]) -> float:
or LLMs.

```python
from phoenix.datasets.evaluators import create_evaluator
from phoenix.experiments.evaluators import create_evaluator

eval_prompt_template = """
Given the QUESTION and REFERENCE_ANSWER, determine whether the ANSWER is accurate.
Expand Down Expand Up @@ -132,7 +132,7 @@ def accuracy(input: Dict[str, Any], output: str, expected: Dict[str, Any]) -> fl
Run an experiment and evaluate the results.

```python
from phoenix.datasets.experiments import run_experiment
from phoenix.experiments import run_experiment

experiment = run_experiment(
dataset,
Expand All @@ -145,7 +145,7 @@ experiment = run_experiment(
Run more evaluators after the fact.

```python
from phoenix.datasets.experiments import evaluate_experiment
from phoenix.experiments import evaluate_experiment

experiment = evaluate_experiment(experiment, evaluators=[contains_keyword, conciseness])
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,7 @@ Run your first experiment and follow the link in the cell output to inspect the


```python
from phoenix.datasets.experiments import run_experiment
from phoenix.experiments import run_experiment

experiment_results = run_experiment(
dataset,
Expand Down
4 changes: 2 additions & 2 deletions docs/datasets-and-experiments/use-cases-datasets/text2sql.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@ Now let's run the evaluation experiment.

```python
import phoenix as px
from phoenix.datasets.experiments import run_experiment
from phoenix.experiments import run_experiment


# Define the task to run text2sql on the input question
Expand Down Expand Up @@ -264,7 +264,7 @@ Amazing. It looks like we removed one of the errors, and got a result for the in

```python
from phoenix.datasets.evaluators.llm_evaluators import LLMCriteriaEvaluator
from phoenix.datasets.experiments import evaluate_experiment
from phoenix.experiments import evaluate_experiment
from phoenix.evals.models import OpenAIModel

llm_evaluator = LLMCriteriaEvaluator(
Expand Down

0 comments on commit e5af8e4

Please sign in to comment.