Merge pull request #3796 from Arize-ai/docs

chore: sync docs
Arize-ai · Jul 2, 2024 · e5af8e4 · e5af8e4
2 parents 521ebde + 51c5ea1
commit e5af8e4
Show file tree

Hide file tree

Showing 7 changed files with 48 additions and 24 deletions.
diff --git a/docs/datasets-and-experiments/how-to-datasets/README.md b/docs/datasets-and-experiments/how-to-datasets/README.md
@@ -1,13 +1,9 @@
 # How-to: Datasets
 
-{% hint style="info" %}
-Datasets is still in pre-release
-{% endhint %}
-
 ## How to create datasets
 
 * [Create datasets from Pandas](creating-datasets.md#create-datasets-from-pandas)
-* Create datasets from spans&#x20;
+* [Create datasets from spans ](creating-datasets.md#from-spans)
 * [Create datasets using synthetic data](creating-datasets.md#syntetic-data)
 
 ## Exporting datasets

diff --git a/docs/datasets-and-experiments/how-to-datasets/creating-datasets.md b/docs/datasets-and-experiments/how-to-datasets/creating-datasets.md
@@ -1,12 +1,8 @@
 # Creating Datasets
 
-{% hint style="info" %}
-Datasets is currently in pre-release
-{% endhint %}
-
 ## From CSV
 
-When manually creating a dataset (let's say collecting hypothetical questions and answers), the easiest way to start is by using a spreadsheet. Once you've collected the information, you can simply upload the CSV of your data to the Phoenix platform using the UI.
+When manually creating a dataset (let's say collecting hypothetical questions and answers), the easiest way to start is by using a spreadsheet. Once you've collected the information, you can simply upload the CSV of your data to the Phoenix platform using the UI. You can also programmatically upload tabular data using Pandas as [seen below.](creating-datasets.md#from-pandas)
 
 ## From Pandas
 
@@ -42,9 +38,26 @@ dataset = client.upload_dataset(
 {% endtab %}
 {% endtabs %}
 
-## Syntetic Data
+## From Objects
 
-One of the quicket way of getting started is to produce synthetic queries using an LLM.
+Sometimes you just want to upload datasets using plain objects as CSVs and DataFrames can be too restrictive about the keys.&#x20;
+
+{% tabs %}
+{% tab title="Python" %}
+```python
+
+ds = px.Client().upload_dataset(
+    dataset_name="my-synthetic-dataset",
+    inputs=[{ "question": "hello" }, { "question": "good morning" }],
+    outputs=[{ "answer": "hi" }, { "answer": "good morning" }],
+);
+```
+{% endtab %}
+{% endtabs %}
+
+## Synthetic Data
+
+One of the quicket way of getting started is to produce synthetic queries using an LLM.&#x20;
 
 {% tabs %}
 {% tab title="Python" %}
@@ -126,3 +139,18 @@ client.upload_dataset(
 ```
 {% endtab %}
 {% endtabs %}
+
+
+
+## From Spans
+
+If you have an application that is traced using instrumentation, you can quickly add any span or group of spans using the Phoenix UI.
+
+To add a single span to a dataset, simply select the span in the trace details view. You should see an add to dataset button on the top right. From there you can select the dataset you would like to add it to and make any changes you might need to make before saving the example.
+
+<figure><img src="https://storage.googleapis.com/arize-assets/phoenix/assets/images/add_span_to_dataset.png" alt=""><figcaption><p>Add a specific span as a golden dataset or an example for further testing</p></figcaption></figure>
+
+\
+You can also use the filters on the spans table and select multiple spans to add to a specific dataset.
+
+<figure><img src="https://storage.googleapis.com/arize-assets/phoenix/assets/images/add_llm_spans_for_ft.png" alt=""><figcaption><p>Add LLM spans for fine tuning to a dataset</p></figcaption></figure>
diff --git a/docs/datasets-and-experiments/how-to-datasets/run-experiments.md b/docs/datasets-and-experiments/how-to-datasets/run-experiments.md
@@ -170,7 +170,7 @@ OpenAIInstrumentor().instrument()
 Running an experiment is as easy as calling `run_experiment` with the components we defined above. The results of the experiment will be show up in Phoenix:
 
 ```python
-from phoenix.datasets.experiments import run_experiment
+from phoenix.experiments import run_experiment
 
 run_experiment(ds, task=task, evaluators=[no_error, has_results])
 ```
diff --git a/docs/datasets-and-experiments/how-to-datasets/using-evaluators.md b/docs/datasets-and-experiments/how-to-datasets/using-evaluators.md
@@ -9,7 +9,7 @@ Datasets and Experiments are currently in pre-release
 We provide LLM evaluators out of the box. These evaluators are vendor agnostic and can be instantiated with a Phoenix model wrapper:
 
 ```python
-from phoenix.datasets.evaluators import HelpfulnessEvaluator
+from phoenix.experiments.evaluators import HelpfulnessEvaluator
 from phoenix.evals.models import OpenAIModel
 
 helpfulness_evaluator = HelpfulnessEvaluator(model=OpenAIModel())
@@ -21,7 +21,7 @@ helpfulness_evaluator = HelpfulnessEvaluator(model=OpenAIModel())
 
 Code evaluators are functions that evaluate the output of your LLM task that don't use another LLM as a judge. An example might be checking for whether or not a given output contains a link - which can be implemented as a RegEx match.
 
-`phoenix.datasets.evaluators` contains some pre-built code evaluators that can be passed to the `evaluators` parameter in experiments.
+`phoenix.experiments.evaluators` contains some pre-built code evaluators that can be passed to the `evaluators` parameter in experiments.
 
 {% tabs %}
 {% tab title="Python" %}
@@ -86,7 +86,7 @@ For even more customization, use the `create_evaluator` decorator to further cus
 {% tabs %}
 {% tab title="Python" %}
 ```python
-from phoenix.datasets.evaluators.utils import create_evaluator
+from phoenix.experiments.evaluators import create_evaluator
 
 # the decorator can be used to set display properties
 # `name` corresponds to the metric name shown in the UI

diff --git a/docs/datasets-and-experiments/quickstart-datasets.md b/docs/datasets-and-experiments/quickstart-datasets.md
@@ -66,15 +66,15 @@ def task(example: Example) -> str:
 Use pre-built evaluators to grade task output with code...
 
 ```python
-from phoenix.datasets.evaluators import ContainsAnyKeyword
+from phoenix.experiments.evaluators import ContainsAnyKeyword
 
 contains_keyword = ContainsAnyKeyword(keywords=["Y Combinator", "YC"])
 ```
 
 or LLMs.
 
 ```python
-from phoenix.datasets.evaluators import ConcisenessEvaluator
+from phoenix.experiments.evaluators import ConcisenessEvaluator
 from phoenix.evals.models import OpenAIModel
 
 model = OpenAIModel(model="gpt-4o")
@@ -99,7 +99,7 @@ def jaccard_similarity(output: str, expected: Dict[str, Any]) -> float:
 or LLMs.
 
 ```python
-from phoenix.datasets.evaluators import create_evaluator
+from phoenix.experiments.evaluators import create_evaluator
 
 eval_prompt_template = """
 Given the QUESTION and REFERENCE_ANSWER, determine whether the ANSWER is accurate.
@@ -132,7 +132,7 @@ def accuracy(input: Dict[str, Any], output: str, expected: Dict[str, Any]) -> fl
 Run an experiment and evaluate the results.
 
 ```python
-from phoenix.datasets.experiments import run_experiment
+from phoenix.experiments import run_experiment
 
 experiment = run_experiment(
     dataset,
@@ -145,7 +145,7 @@ experiment = run_experiment(
 Run more evaluators after the fact.
 
 ```python
-from phoenix.datasets.experiments import evaluate_experiment
+from phoenix.experiments import evaluate_experiment
 
 experiment = evaluate_experiment(experiment, evaluators=[contains_keyword, conciseness])
 ```

diff --git a/docs/datasets-and-experiments/use-cases-datasets/summarization.md b/docs/datasets-and-experiments/use-cases-datasets/summarization.md
@@ -219,7 +219,7 @@ Run your first experiment and follow the link in the cell output to inspect the
 
 
 ```python
-from phoenix.datasets.experiments import run_experiment
+from phoenix.experiments import run_experiment
 
 experiment_results = run_experiment(
     dataset,

diff --git a/docs/datasets-and-experiments/use-cases-datasets/text2sql.md b/docs/datasets-and-experiments/use-cases-datasets/text2sql.md
@@ -188,7 +188,7 @@ Now let's run the evaluation experiment.
 
 ```python
 import phoenix as px
-from phoenix.datasets.experiments import run_experiment
+from phoenix.experiments import run_experiment
 
 
 # Define the task to run text2sql on the input question
@@ -264,7 +264,7 @@ Amazing. It looks like we removed one of the errors, and got a result for the in
 
 ```python
 from phoenix.datasets.evaluators.llm_evaluators import LLMCriteriaEvaluator
-from phoenix.datasets.experiments import evaluate_experiment
+from phoenix.experiments import evaluate_experiment
 from phoenix.evals.models import OpenAIModel
 
 llm_evaluator = LLMCriteriaEvaluator(