Skip to content

Commit

Permalink
docs: Updated README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Anush008 committed Feb 28, 2024
1 parent 41d40c4 commit 1a1f1c7
Showing 1 changed file with 4 additions and 5 deletions.
9 changes: 4 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ from pyspark.sql import SparkSession

spark = SparkSession.builder.config(
"spark.jars",
"spark-2.0-jar-with-dependencies.jar", # specify the downloaded JAR file
"spark-2.0.jar", # specify the downloaded JAR file
)
.master("local[*]")
.appName("qdrant")
Expand All @@ -58,7 +58,7 @@ To load data into Qdrant, a collection has to be created beforehand with the app
<pyspark.sql.DataFrame>
.write
.format("io.qdrant.spark.Qdrant")
.option("qdrant_url", <QDRANT_URL>)
.option("qdrant_url", <QDRANT_GRPC_URL>)
.option("collection_name", <QDRANT_COLLECTION_NAME>)
.option("embedding_field", <EMBEDDING_FIELD_NAME>) # Expected to be a field of type ArrayType(FloatType)
.option("schema", <pyspark.sql.DataFrame>.schema.json())
Expand All @@ -81,17 +81,16 @@ You can use the `qdrant-spark` connector as a library in Databricks to ingest da

## Datatype support πŸ“‹

Qdrant supports all the Spark data types, and the appropriate types are mapped based on the provided `schema`.
Qdrant supports all the Spark data types. The appropriate types are mapped based on the provided `schema`.

## Options and Spark types πŸ› οΈ

| Option | Description | DataType | Required |
| :---------------- | :------------------------------------------------------------------------ | :--------------------- | :------- |
| `qdrant_url` | REST URL of the Qdrant instance | `StringType` | βœ… |
| `qdrant_url` | GRPC URL of the Qdrant instance. Eg: <http://localhost:6334> | `StringType` | βœ… |
| `collection_name` | Name of the collection to write data into | `StringType` | βœ… |
| `embedding_field` | Name of the field holding the embeddings | `ArrayType(FloatType)` | βœ… |
| `schema` | JSON string of the dataframe schema | `StringType` | βœ… |
| `mode` | Write mode of the dataframe. Supports "append". | `StringType` | βœ… |
| `id_field` | Name of the field holding the point IDs. Default: Generates a random UUId | `StringType` | ❌ |
| `batch_size` | Max size of the upload batch. Default: 100 | `IntType` | ❌ |
| `retries` | Number of upload retries. Default: 3 | `IntType` | ❌ |
Expand Down

0 comments on commit 1a1f1c7

Please sign in to comment.