This is the repository for all the material of the OpenSearch Neural Search Tutorial. Here you can find everything you need to deploy a simple OpenSearch system to do neural queries.
To directly use the existing material, without generating documents and models by yourself, you only need:
- OpenSearch 2.11.0
To create documents by yourself you also need:
- python 3.10
- documents: contains convert_msmarco_data_to_opensearch_format.py python script to generate OpenSearch documents from MS Marco data.
- msmarco_documents: contains the MS Marco data
- opensearch_documents: contains the OpenSearch documents
- docker-compose.yml: the docker compose file used to run OpenSearch 2.11.0.
- OpenSearch.postman_collection.json: the postman collection containing all the useful requests.
Set up your Docker host environment:
- macOS & Windows: In Docker Preferences > Resources, set RAM to at least 4 GB.
- Linux: Ensure vm.max_map_count is set to at least 262144 as per the documentation.
Verify to meet all the installation requirements: https://opensearch.org/docs/latest/install-and-configure/install-opensearch/index/
You can skip this step if you want to use the already provided material.
To generate documents:
python convert_msmarco_data_to_opensearch_format.py
From the folder containing the docker-compose.yml file, start OpenSearch with:
docker-compose up
OpenSearch will be available at https://localhost:9200/
{
"_source": [
"general_text"
],
"query": {
"neural": {
"general_text_vector": {
"query_text": "what is a bank transit number",
"model_id": "loaded_neural_model_id",
"k": 3
}
}
}
}
{
"query": {
"bool": {
"filter": {
"term": {
"color": "white"
}
},
"must": {
"neural": {
"general_text_vector": {
"query_text": "what is a bank transit number",
"model_id": "loaded_neural_model_id",
"k": 3
}
}
}
}
}
}