20 Feb 19:02

vblagoje

a5ef5b4

v1.14.0rc1 Pre-release

Pre-release

⭐ Highlights

PromptNode enhancements

PromptNode just rolled out prompt logging (pipeline debug), run_batch, and model_kwargs support. More updates to PromptNode and PromptTemplates coming soon!

Shaper

We're introducing the Shaper, PromptNode's helper. Shaper unlocks the full potential of PromptNode and ensures its seamless integration with Haystack. But Shaper's scope and functionality are not limited to PromptNode; you can also use it independently, opening up a whole new world of possibilities.

IVF and Product Quantization support for OpenSearchDocumentStore

We've added support for IVF and IVF with Product Quantization to OpenSearchDocumentStore. You can train the IVF index by calling train_index method (same as in FAISSDocumentStore) or by setting ivf_train_size when initializing OpenSearchDocumentStore and take your search to the next level.

What's Changed

Breaking Changes

refactor: Updated rest_api schema for tables to be consistent with Document.to_dict by @sjrl in #3872
feat: Support multiple document_ids in Answer object (for generative QA) by @tstadel in #4062
feat: Update OpenAIAnswerGenerator defaults and with learnings from PromptNode by @sjrl in #4038
build: cache nltk models into the docker image by @mayankjobanputra in #4118
feat: Add IVF and Product Quantization support for OpenSearchDocumentStore by @bogdankostic in #3850

Pipeline

feat: add frontmatter to meta in MarkdownConverter by @TuanaCelik in #3953
fix: removing code block in MarkdownConverter by @TuanaCelik in #3960
feat: Add page range support to PDF converters. by @danielbichuetti in #3965
fix: Update telemetry to not serialize Pipeline if disabled. by @sjrl in #4000
feat: add Shaper by @ZanSara in #3880
fix: Event sending for RayPipeline crashing Haystack by @zoltan-fedor in #3971
fix: document retrieval metrics for non-document_id document_relevance_criteria by @tstadel in #3885
fix: make the crawler more robust on Windows by @anakin87 in #4049
fix: use correct count of outgoing edges in RayPipeline by @zoltan-fedor in #4066
feat: Allow all training options for training a SentenceTransformers EmbeddingRetriever by @sjrl in #4026
refactor: replace mutable default arguments by @julian-risch in #4070
feat: Support multiple RayPipelines by @zoltan-fedor in #4078
Remove double batching in retrieve_batch by @sjrl in #4014
style: Update black by @silvanocerza in #4101
fix: Fix TableTextRetriever for input consisting of tables only by @jackapbutler in #4048
fix: Deduplicate same Documents in isolated evaluation of Reader by @bogdankostic in #4114
Docs: Fix code block formatting by @agnieszka-m in #4162
refactor: Remove the pin from the espnet module and fix the audio node tests. by @danielbichuetti in #4128
fix: change tiktoken fallback mechanism to support Windows amd64 by @danielbichuetti in #4175
feat: Add OpenAIError to retry mechanism by @sjrl in #4178

DocumentStores

refactor: use weaviate client to build BM25 query by @hsm207 in #3939
fix: fixed InMemoryDocumentStore.get_embedding_count to return correct number by @sjrl in #3980
fix: Add inner query for mysql compatibility by @julian-risch in #4068
feat: add support for custom headers by @hsm207 in #4040
feat: Add BM25 support for tables in InMemoryDocumentStore by @bogdankostic in #4090
refactor: InMemoryDocumentStore - manage documents without embedding & fix mypy errors by @anakin87 in #4113
refactor: complete the document stores test refactoring by @masci in #4125
feat: include testing facilities into haystack package by @masci in #4182

Documentation

Align with the docs install guide + correct lg by @agnieszka-m in #3950
docs: Update Crawler docstring for correct usage in Google colab by @silvanocerza in #3979
Docs: Update docstrings by @agnieszka-m in #4119
docs: Update Annotation Tool README.md by @bogdankostic in #4123
feat: Add model_kwargs option to PromptNode by @sjrl in #4151
fix: Remove logging statement of setting ID manually in Document by @bogdankostic in #4129
chore: Fixing PromptNode .prompt() docstring to include the PromptTemplate object as an option by @TuanaCelik in #4135
chore: de-couple the telemetry events for each tutorial from the dataset on AWS that is used by @TuanaCelik in #4155
Prompt node/run batch by @sjrl in #4072

Other Changes

feat: adding secure loading of models by default for haystack by @mayankjobanputra in #3901
fix: add tiktoken fallback mechanism. by @danielbichuetti in #3929
fix: change model in distillation test by @ZanSara in #3944
feat: Expose output_variable in PromptNode result, adjust unit tests by @vblagoje in #3892
fix: Fix type in FARMReader's save_to_remote by @bogdankostic in #3952
refactor: Remove PromptNode hash and equality functions by @vblagoje in #3923
ci: Remove mypy deps install step in python_cache action by @silvanocerza in #3956
fix: overwrite params with environment variables even if there are no params in the pipeline definition; make mypy ignore REST API tests by @anakin87 in #3930
Docs: Update ImageToText docstrings by @agnieszka-m in #3963
Docs: Add TransformersImageToText API doc by @agnieszka-m in #3966
ci: Add Docker images testing by @silvanocerza in #3943
feat: Allow users to set a timeout for remote APIs by @danielbichuetti in #3949
ci: Fix docker image testing on release by @silvanocerza in #3976
Fix: Fix quotation marks by @agnieszka-m in #3973
fix: PromptNode doesn't have run_batch support (yet) by @vblagoje in #3972
chore: increased timeout for loading pipelines through API by @mayankjobanputra in #3977
Missing import for TransformersImageToText by @ZanSara in #3984
test: CI on py3.8 by @ZanSara in #3926
Simplifies and fix docker images tests on release by @silvanocerza in #3982
feat: Add use_prefiltering parameter to DeepsetCloudDocumentStore by @bogdankostic in #3969
ci: Delete Docker images after testing to prevent workflow failure by @silvanocerza in #4004
fix: Add a verbose option to PromptNode to let users understand the prompts being used #2 by @zoltan-fedor in #3898
fix: prevent posthog from sending errors to stderr by @julian-risch in #4008
fix: extend schema for prompt node results by @tstadel in #3891
proposal: TableCell by @sjrl in #3875
refactor: In PromptNode reuse tokenizer instead of loading new one for stop words by @sjrl in #4016
ci: Automate release on PyPi by @silvanocerza in #4015
ci: Fix PyPi release workflow by @silvanocerza in #4029
ci: Bump act10ns/slack from v1 to v2 by @silvanocerza in #4031
ci: latest version of pylint is failing, ignore new errors by @masci in https://github.com/deep...

Contributors

masci, vblagoje, and 16 other contributors

Assets 2

09 Feb 19:06

ZanSara

v1.13.2

d69e409

v1.13.2

What's Changed

Pipelines

fix: fix torchaudio version by @mayankjobanputra in #4102
feat: Add use_prefiltering parameter to DeepsetCloudDocumentStore by @bogdankostic in #3969

Documentation

Add shaper api by @agnieszka-m in #4082
Update imgtotext api by @agnieszka-m in #4074

Full Changelog: v1.13.1...v1.13.2

Contributors

mayankjobanputra, bogdankostic, and agnieszka-m

Assets 2

02 Feb 20:03

ZanSara

v1.13.1

5578a0c

v1.13.1

What's Changed

fix: document retrieval metrics for non-document_id document_relevance_criteria (#3885)
Update pyproject.toml (#4035)
feat: add Shaper (#3880)
fix: extend schema for prompt node results (#3891)
fix: removing code block in MarkdownConverter (#3960)
feat: add frontmatter to meta in MarkdownConverter (#3953)

Full Changelog: v1.13.0...v1.13.1

Assets 2

27 Jan 13:43

ZanSara

v1.13.0

cae33b1

v1.13.0

⭐ Highlights

Stop words for `PromptNode`

Implements stop words on the level of the PromptNode (for all models). Users can specify ‘stop_words’ as PromptNode list parameter, and thus stop LLM text generation once any of the stop words is encountered. Stop words will not be included in the response.

A dedicated Github repository for Haystack demo(s)

The source code for Haystack' Explore the World demo has been moved to a dedicated repository: https://github.com/deepset-ai/haystack-demos. Use this repository to check out the code, run it locally, fork, customize, and contribute!

New nodes: `ImageToText` and `CsvTextConverter`

This release sees two new nodes, both contributed by community members!

The first one is ImageToText (courtesy of our well-known @anakin87): an image captioning node that can generate description of image files and create Haystack documents from them.

The second one is CsvTextConverter, from @Benvii: a small utility node that can load a CSV of FAQ question-answer pairs and correctly send them to your DocumentStore, making it super handy for FAQ matching pipelines.

Check out the docs to know more about them and try them out!

Faster tokenization for GPT models with `tiktoken`

Haystack now supports faster tokenization with OpenAI's tiktoken library, which can dramatically improve tokenization speed for GPT models. For unsupported architectures (Py < 3.8, arm64 and MacOS) fallbacks are in place and regular HuggingFace tokenizers are used. Thanks to @danielbichuetti for yet another amazing contribution!

What's Changed

Breaking Changes

Migrating to use native Pytorch AMP by @sjrl in #2827
bug: consistent batch_size parameter names in distillation by @julian-risch in #3811
refactor: Move invocation_context from meta to own pipeline variable by @vblagoje in #3888

Pipeline

feat: Update cohere embedding models by @vblagoje in #3704
feat: add index parameter to TfidfRetriever by @anakin87 in #3666
feat: Use torch.inference_mode() for TableQA by @sjrl in #3731
feat: Enable text-embedding-ada-002 for EmbeddingRetriever by @vblagoje in #3721
refactor: improve monkey patch for SklearnQueryClassifier by @anakin87 in #3732
refactor: remove unused code in TfidfRetriever by @anakin87 in #3733
refactor: Remove duplicate code in TableReader by @sjrl in #3708
fix: Make InferenceProcessor thread safe by @bogdankostic in #3709
chore: adding template for prompt node by @TuanaCelik in #3738
fix: Fixed local reader model loading by @mayankjobanputra in #3663
fix: Fix predict_batch in TransformersReader for single nested Document list by @bogdankostic in #3748
feat: change PipelineConfigError to DocumentStoreError with more details by @julian-risch in #3783
bug: skip empty documents in reader by @julian-risch in #3773
fix: linefeeds in custom_query by @tstadel in #3813
fix: Convert table cells to strings for compatibility with TableReader by @sjrl in #3762
fix: Ensure eval mode for TableReader model for predictions by @sjrl in #3743
fix: gracefully handle FileExistsError during Preprocessor resource download by @wochinge in #3816
fix: make the crawler runnable and testable on Windows by @anakin87 in #3830
fix: ignore non-serializable params when hashing pipeline objects by @masci in #3842
feat: preprocessor raises warning when doc length exceeds threshold by @ZanSara in #3837
fix: remove string validation in YAML by @ZanSara in #3854
feat: Use truncate option for Cohere.embed by @sjrl in #3865
feat: ImageToText (caption generator) by @anakin87 in #3859
fix: Remove double super class init from ParsrConverter init by @silvanocerza in #3896
feat: store id_hash_keys in Document objects to make documents clonable by @ZanSara in #3697
feat: adding the ability to use Ray Serve async functionality by @zoltan-fedor in #3769
feat: support cl100k_base tokenization and increase performance for GPT2 by @danielbichuetti in #3897
fix: Fix number of concurrent requests in RequestLimiter by @bogdankostic in #3705
feat: Run commands inside docker container as a non root user by @vblagoje in #3702
fix: Removed overlooked torch scatter references by @sjrl in #3719
build: upgrade torch and let transformers pick the version by @julian-risch in #3727
feat: Expand LLM support with PromptModel, PromptNode, and PromptTemplate by @vblagoje in #3667
refactor: remove deprecated parameters from Summarizer by @anakin87 in #3740
refactor: Using with open() to read files by @sjrl in #3787
feat: Bump python to 3.10 for gpu docker image, use nvidia/cuda by @vblagoje in #3701
fix: pin protobuf version by @masci in #3789
fix(docker): Use IMAGE_NAME in api image by @FabianHertwig in #3786
bug: Fix launch_milvus() by cd'ing to milvus_dir by @t0r0id in #3795
refactor: Change PromptNode registered templates from per class to per instance by @vblagoje in #3810
bug: The PromptNode handles all parameters as lists without checking if they are in fact lists by @zoltan-fedor in #3820
feat: update the docker image for haystack-api service by @bilgeyucel in #3835
refactor: Simplify PromptTemplate substitution in PromptNode by @vblagoje in #3876
feat: PromptNode - implement stop words by @vblagoje in #3884
feat: Add retry with exponential back-off to PromptNode's OpenAI models by @vblagoje in #3886
chore: Add timeouts to external requests calls by @silvanocerza in #3895
feat: Add CsvTextConverter by @Benvii in #3587
refactor: Improve stop_words handling, add unit test cases by @vblagoje in #3918
refactor: Updated rest_api schema for tables to be consistent with Document.to_dict #3872

Models

fix: adjust max token size for openai ADA-v2 embeddings by @LeoGitGuy in #3793
feat: make new sklearn models default in QueryClassifier by @julian-risch in #3777

DocumentStores

Fixing broken BM25 support with Weaviate - fixes #3720 by @zoltan-fedor in #3723
feat: make score_script first class citizen via knn_engine param by @tstadel in #3284
bug: skip validating empty embeddings by @julian-risch in #3774
fix: Despite return_embedding=False SearchEngineDocumentStore.query retrieves embedding_field by @tstadel in #3662
fix: upgrade launch_es() to the version used in CI by @ZanSara in #3858
Adding condition to pinecone object. by @AI-Ahmed in #3768
fix: Allowing InMemStore and FAISSDocStore for indexing using single worker by @mayankjobanputra in #3868
fix: authenticate with aws4auth if set in OpenSearchDocumentStore by @FabianHertwig in #3741
Fixing the query_batch method of the deepsetcloud document store - … by @zoltan-fedor in #3724
feat: add HA support for Weaviate by @zoltan-fedor in #3764

UI / Demo

refactor: remove haystack demo along with deprecated Dockerfiles by @masci in #3829

...

Contributors

masci, askainet, and 20 other contributors

Assets 2

22 Dec 13:15

bogdankostic

v1.12.2

ef1fa09

v1.12.2

What's Changed

Fixing the query_batch method of the deepsetcloud document store by @zoltan-fedor in #3724
build: upgrade torch and let transformers pick the version by @julian-risch in #3727
fix: Removed overlooked torch scatter references by @sjrl in #3719

Full Changelog: v1.12.1...v1.12.2

Contributors

zoltan-fedor, julian-risch, and sjrl

Assets 2

22 Dec 11:24

bogdankostic

v1.12.2rc1

7dbb7e4

v1.12.2rc1 Pre-release

Pre-release

What's Changed

Fixing the query_batch method of the deepsetcloud document store by @zoltan-fedor in #3724
build: upgrade torch and let transformers pick the version by @julian-risch in #3727
fix: Removed overlooked torch scatter references by @sjrl in #3719

Full Changelog: v1.12.1...v1.12.2rc1

Contributors

zoltan-fedor, julian-risch, and sjrl

Assets 2

21 Dec 20:12

bogdankostic

v1.12.1

cde10a7

v1.12.1

⭐ Highlights

Large Language Models with `PromptNode`

Introducing PromptNode, a new feature that brings the power of large language models (LLMs) to various NLP tasks. PromptNode is an easy-to-use, customizable node you can run on its own or in a pipeline. We've designed the API to be user-friendly and suitable for everyday experimentation, but also fully compatible with production-grade Haystack deployments.

By setting a prompt template for a PromptNode you define what task you want it to do. This way, you can have multiple PromptNodes in your pipeline, each performing a different task. But that's not all. You can also inject the output of one PromptNode into the input of another one.

Out of the box, we support both Google T5 Flan and OpenAI GPT-3 models, and you can even mix and match these models in your pipelines.

from haystack.nodes.prompt import PromptNode

# Initialize the node:
prompt_node = PromptNode("google/flan-t5-base")  # try also 'text-davinci-003' if you have an OpenAI key

prompt_node("What is the capital of Germany?")

This node can do a lot more than simply querying LLMs: they can manage prompt templates, run batches, share models among instances, be chained together in pipelines, and more. Check its documentation for details!

Support for `BM25Retriever` in `InMemoryDocumentStore`

InMemoryDocumentStore has always been the go-to document store for small prototypes. The addition of BM25 support makes it officially one of the document stores to support all Retrievers available to Haystack, just like FAISS and Elasticsearch-like stores, but without the external dependencies. Don't use it in your million-documents-throughput deployments to production, though. It's not the fastest document store out there.

🏆 Honorable mention to @anakin87 for this outstanding contribution, among many many others! 🏆

Haystack is always open to external contributions, and every little bit is appreciated. Don't know where to start? Have a look at the Contributors Guidelines.

Extended support for Cohere and OpenAI embeddings

We enabled EmbeddingRetriever to use the latest Cohere multilingual embedding models and OpenAI embedding models.

Simply use the model's full name (along with your API key) in EmbeddingRetriever to get them:

# Cohere
retriever = EmbeddingRetriever(embedding_model="multilingual-22-12", batch_size=16, api_key=api_key)
# OpenAI
retriever = EmbeddingRetriever(embedding_model="text-embedding-ada-002", batch_size=32, api_key=api_key, max_seq_len=8191)

Speeding up dense searches in batch mode (Elasticsearch and OpenSearch)

Whenever you need to execute multiple dense searches at once, ElasticsearchDocumentStore and OpenSearchDocumentStore can now do it in parallel. This not only speeds up run_batch and eval_batch for dense pipelines when used with those document stores but also significantly speeds up multi-embedding retrieval pipelines like, for example, MostSimilarDocumentsPipeline.

For this, we measured a speed up of up to 49% on a realistic dataset.

Under the hood, our newly introduced query_by_embedding_batch document store function uses msearch to unchain the full power of your Elasticsearch/OpenSearch cluster.

⚠️ Deprecated Docker images discontinued

1.12 is the last release we're shipping with the old Docker images deepset/haystack-cpu, deepset/haystack-gpu, and their relative tags. We'll remove the corresponding, deprecated Docker files /Dockerfile, /Dockerfile-GPU, and /Dockerfile-GPU-minimal from the codebase after the release.

What's Changed

Pipeline

fix: ParsrConverter fails on pages without text by @anakin87 in #3605
fix: Convert eval metrics to python float by @tstadel in #3612
feat: add support for BM25Retriever in InMemoryDocumentStore by @anakin87 in #3561
chore: fix return type of aggregate_labels by @tstadel in #3617
refactor: change MultiModal retriever to be of type DenseRetriever by @mayankjobanputra in #3598
fix: Move entire forward pass of TableQA within torch.no_grad() by @sjrl in #3636
feat: add offsets_in_context to evaluation result by @julian-risch in #3640
bug: Use tqdm auto instead of plain tqdm by @vblagoje in #3672
fix: monkey patch for SklearnQueryClassifier by @anakin87 in #3678
feat: Update table reader tests to check the answer scores by @sjrl in #3641
feat: Adds all_terms_must_match parameter to BM25Retriever at runtime by @ugm2 in #3627
fix: fix PreProcessor split_by schema by @ZanSara in #3680
refactor: Generate JSON schema when missing by @masci in #3533
refactor: replace torch.no_grad with torch.inference_mode (where possible) by @anakin87 in #3601
Adjust get_type() method for pipelines by @vblagoje in #3657
refactor: improve Multilabel design by @anakin87 in #3658
feat: Update cohere embedding models #3704 by @vblagoje #3704
feat: Enable text-embedding-ada-002 for EmbeddingRetriever #3721 by @vblagoje #3721
feat: Expand LLM support with PromptModel, PromptNode, and PromptTemplate by @vblagoje in #3667

DocumentStores

fix: Flatten DocumentClassifier output in SQLDocumentStore by @anakin87 in #3273
refactor: move milvus tests to their own module by @masci in #3596
feat: store metadata using JSON in SQLDocumentStore by @masci in #3547
fix: Pin faiss-cpu as 1.7.3 seems to have problems by @masci in #3603
refactor: Move InMemoryDocumentStore tests to their own class by @masci in #3614
chore: remove redundant tests by @masci in #3620
refactor: Weaviate query with filters by @ZanSara in #3628
fix: use 9200 as the default port in launch_opensearch() by @masci in #3630
fix: revert Weaviate query with filters and improve tests by @ZanSara in #3646
feat: add query_by_embedding_batch by @tstadel in #3546
refactor: filters type by @tstadel in #3682
fix: pinecone metadata format by @jamescalam in #3660
fix: fixing broken BM25 support with Weaviate - fixes #3720 #3723 by @zoltan-fedor #3723

Documentation

fix: fixing the url for document merger by @TuanaCelik in #3615
docs: Reformat code blocks in docstrings by @brandenchan in #3580

Contributors to Tutorials

fix: Tutorial 2, finetune a model, distillation code by Benvii deepset-ai/haystack-tutorials#69
chore: Update 01_Basic_QA_Pipeline.ipynb by gsajko deepset-ai/haystack-tutorials#63

Other Changes

test: add test to check id_hash_keys is not ignored by @julian-risch in #3577
fix: remove beir from all-gpu by @ZanSara in #3669
feat: Update DocumentMerger and TextIndexingPipeline imports by @brandenchan in #3599
fix: pin espnet in the audio extra by @ZanSara in #3693
refactor: update Squad data by @espoirMur in #3513
Update CONTRIBUTING.md by @TuanaCelik in #3624
fix: revamp colab extra dependencies by @masci in #3626
refactor: remove test extra by @ZanSara in #3679
fix: remove beir from the base GPU image by @ZanSara in #3692
feat: Bump transformers version to remove torch scatter dependency by @sjrl in #3703

New Contributors

@espoirMur made their first contribution in #3513

Full Changelog: v1.11.1...v1.12.1

Contributors

masci, vblagoje, and 12 other contributors

Assets 2

21 Dec 16:51

bogdankostic

v1.12.0

71de1ca

v1.12.0

v1.12.0

Assets 2

19 Dec 09:40

ZanSara

v1.12.0rc1

c62b5fb

v1.12.0rc1 Pre-release

Pre-release

⭐ Highlights

Large Language Models with `PromptNode`

Out of the box, we support both Google T5 Flan and OpenAI GPT-3 models, and you can even mix and match these models in your pipelines.

from haystack.nodes.prompt import PromptNode

# Initialize the node:
prompt_node = PromptNode("google/flan-t5-base")  # try also 'text-davinci-003' if you have an OpenAI key

prompt_node("What is the capital of Germany?")

Support for `BM25Retriever` in `InMemoryDocumentStore`

🏆 Honorable mention to @anakin87 for this outstanding contribution, among many many others! 🏆

Haystack is always open to external contributions, and every little bit is appreciated. Don't know where to start? Have a look at the Contributors Guidelines.

Extended support for Cohere and OpenAI embeddings

We enabled EmbeddingRetriever to use the latest Cohere multilingual embedding models and OpenAI embedding models.

Simply use the model's full name (along with your API key) in EmbeddingRetriever to get them:

# Cohere
retriever = EmbeddingRetriever(embedding_model="multilingual-22-12", batch_size=16, api_key=api_key)
# OpenAI
retriever = EmbeddingRetriever(embedding_model="text-embedding-ada-002", batch_size=32, api_key=api_key, max_seq_len=8191)

Speeding up dense searches in batch mode (Elasticsearch and OpenSearch)

For this, we measured a speed up of up to 49% on a realistic dataset.

Under the hood, our newly introduced query_by_embedding_batch document store function uses msearch to unchain the full power of your Elasticsearch/OpenSearch cluster.

⚠️ Deprecated Docker images discontinued

What's Changed

Pipeline

fix: ParsrConverter fails on pages without text by @anakin87 in #3605
fix: Convert eval metrics to python float by @tstadel in #3612
feat: add support for BM25Retriever in InMemoryDocumentStore by @anakin87 in #3561
chore: fix return type of aggregate_labels by @tstadel in #3617
refactor: change MultiModal retriever to be of type DenseRetriever by @mayankjobanputra in #3598
fix: Move entire forward pass of TableQA within torch.no_grad() by @sjrl in #3636
feat: add offsets_in_context to evaluation result by @julian-risch in #3640
bug: Use tqdm auto instead of plain tqdm by @vblagoje in #3672
fix: monkey patch for SklearnQueryClassifier by @anakin87 in #3678
feat: Update table reader tests to check the answer scores by @sjrl in #3641
feat: Adds all_terms_must_match parameter to BM25Retriever at runtime by @ugm2 in #3627
fix: fix PreProcessor split_by schema by @ZanSara in #3680
refactor: Generate JSON schema when missing by @masci in #3533
refactor: replace torch.no_grad with torch.inference_mode (where possible) by @anakin87 in #3601
Adjust get_type() method for pipelines by @vblagoje in #3657
refactor: improve Multilabel design by @anakin87 in #3658
feat: Update cohere embedding models #3704 by @vblagoje #3704
feat: Enable text-embedding-ada-002 for EmbeddingRetriever #3721 by @vblagoje #3721

DocumentStores

fix: Flatten DocumentClassifier output in SQLDocumentStore by @anakin87 in #3273
refactor: move milvus tests to their own module by @masci in #3596
feat: store metadata using JSON in SQLDocumentStore by @masci in #3547
fix: Pin faiss-cpu as 1.7.3 seems to have problems by @masci in #3603
refactor: Move InMemoryDocumentStore tests to their own class by @masci in #3614
chore: remove redundant tests by @masci in #3620
refactor: Weaviate query with filters by @ZanSara in #3628
fix: use 9200 as the default port in launch_opensearch() by @masci in #3630
fix: revert Weaviate query with filters and improve tests by @ZanSara in #3646
feat: add query_by_embedding_batch by @tstadel in #3546
refactor: filters type by @tstadel in #3682
fix: pinecone metadata format by @jamescalam in #3660
fix: fixing broken BM25 support with Weaviate - fixes #3720 #3723 by @zoltan-fedor #3723

Documentation

fix: fixing the url for document merger by @TuanaCelik in #3615
docs: Reformat code blocks in docstrings by @brandenchan in #3580

Contributors to Tutorials

fix: Tutorial 2, finetune a model, distillation code by Benvii deepset-ai/haystack-tutorials#69
chore: Update 01_Basic_QA_Pipeline.ipynb by gsajko deepset-ai/haystack-tutorials#63

Other Changes

test: add test to check id_hash_keys is not ignored by @julian-risch in #3577
fix: remove beir from all-gpu by @ZanSara in #3669
feat: Update DocumentMerger and TextIndexingPipeline imports by @brandenchan in #3599
fix: pin espnet in the audio extra by @ZanSara in #3693
refactor: update Squad data by @espoirMur in #3513
Update CONTRIBUTING.md by @TuanaCelik in #3624
fix: revamp colab extra dependencies by @masci in #3626
refactor: remove test extra by @ZanSara in #3679
fix: remove beir from the base GPU image by @ZanSara in #3692
feat: Bump transformers version to remove torch scatter dependency by @sjrl in #3703

New Contributors

@espoirMur made their first contribution in #3513

Full Changelog: v1.11.1...v1.12.0rc1

Contributors

masci, vblagoje, and 12 other contributors

Assets 2

06 Dec 18:11

bogdankostic

v1.11.1

05ea711

v1.11.1

What's Changed

fix: Pin faiss-cpu as 1.7.3 seems to have problems by @masci in #3603

Full Changelog: v1.11.0...v1.11.1

Contributors

masci

Assets 2

Releases: deepset-ai/haystack

v1.14.0rc1

⭐ Highlights

PromptNode enhancements

Shaper

IVF and Product Quantization support for OpenSearchDocumentStore

What's Changed

Breaking Changes

Pipeline

DocumentStores

Documentation

Other Changes

Contributors

v1.13.2

What's Changed

Pipelines

Documentation

Contributors

v1.13.1

What's Changed

v1.13.0

⭐ Highlights

Stop words for PromptNode

A dedicated Github repository for Haystack demo(s)

New nodes: ImageToText and CsvTextConverter

Faster tokenization for GPT models with tiktoken

What's Changed

Breaking Changes

Pipeline

Models

DocumentStores

UI / Demo

Contributors

v1.12.2

What's Changed

Contributors

v1.12.2rc1

What's Changed

Contributors

v1.12.1

⭐ Highlights

Large Language Models with PromptNode

Support for BM25Retriever in InMemoryDocumentStore

Extended support for Cohere and OpenAI embeddings

Speeding up dense searches in batch mode (Elasticsearch and OpenSearch)

⚠️ Deprecated Docker images discontinued

What's Changed

Pipeline

DocumentStores

Documentation

Contributors to Tutorials

Other Changes

New Contributors

Contributors

v1.12.0

v1.12.0rc1

⭐ Highlights

Large Language Models with PromptNode

Support for BM25Retriever in InMemoryDocumentStore

Extended support for Cohere and OpenAI embeddings

Speeding up dense searches in batch mode (Elasticsearch and OpenSearch)

⚠️ Deprecated Docker images discontinued

What's Changed

Pipeline

DocumentStores

Documentation

Contributors to Tutorials

Other Changes

New Contributors

Contributors

v1.11.1

What's Changed

Contributors

Stop words for `PromptNode`

New nodes: `ImageToText` and `CsvTextConverter`

Faster tokenization for GPT models with `tiktoken`

Large Language Models with `PromptNode`

Support for `BM25Retriever` in `InMemoryDocumentStore`

Large Language Models with `PromptNode`

Support for `BM25Retriever` in `InMemoryDocumentStore`