Skip to content

Releases: deepset-ai/haystack

v2.8.1-rc2

18 Dec 21:18
167a48e
Compare
Choose a tag to compare
v2.8.1-rc2 Pre-release
Pre-release

Release Notes

v2.8.1-rc2

Bug Fixes

  • Fixes issues with deserialization of components in multi-threaded environments.

v2.8.1-rc1

Bug Fixes

  • Pin OpenAI client to >=1.56.1 to avoid issues related to changes in the httpx library.

v2.8.1-rc1

18 Dec 15:04
55e7fbf
Compare
Choose a tag to compare
v2.8.1-rc1 Pre-release
Pre-release

Release Notes

v2.8.1-rc1

Bug Fixes

  • Pin OpenAI client to >=1.56.1 to avoid issues related to changes in the httpx library.

v2.8.0

05 Dec 11:26
Compare
Choose a tag to compare

Release Notes

⬆️ Upgrade Notes

  • Remove is_greedy deprecated argument from @component decorator. Change the Variadic input of your Component to GreedyVariadic instead.

🚀 New Features

  • We've added a new DALLEImageGenerator component, bringing image generation with OpenAI's DALL-E to the Haystack
    • Easy to Use: Just a few lines of code to get started:
      from haystack.components.generators import DALLEImageGenerator 
      
      image_generator = DALLEImageGenerator() 
      response = image_generator.run("Show me a picture of a black cat.") 
      print(response) 
  • Add warning logs to the PDFMinerToDocument and PyPDFToDocument to indicate when a processed PDF file has no content. This can happen if the PDF file is a scanned image. Also added an explicit check and warning message to the DocumentSplitter that warns the user that empty Documents are skipped. This behavior was already occurring, but now its clearer through logs that this is happening.
  • We have added a new MetaFieldGroupingRanker component that reorders documents by grouping them based on metadata keys. This can be useful for pre-processing Documents before feeding them to an LLM.
  • Added a new store_full_path parameter to the __init__ methods of the following converters:
    JSONConverter, CSVToDocument, DOCXToDocument, HTMLToDocument MarkdownToDocument, PDFMinerToDocument, PPTXToDocument, TikaDocumentConverter, PyPDFToDocument , AzureOCRDocumentConverter and TextFileToDocument. The default value is True, which stores full file path in the metadata of the output documents. When set to False, only the file name is stored.
  • When making function calls via OpenAPI, allow both switching SSL verification off and specifying a certificate authority to use for it.
  • Add TTFT (Time-to-First-Token) support for OpenAI generators. This captures the time taken to generate the first token from the model and can be used to analyze the latency of the application.
  • Added a new option to the required_variables parameter to the PromptBuilder and ChatPromptBuilder. By passing required_variables="*" you can automatically set all variables in the prompt to be required.

⚡️ Enhancement Notes

  • Across Haystack codebase, we have replaced the use of ChatMessage data class constructor with specific class methods (ChatMessage.from_user, ChatMessage.from_assistant, etc.).
  • Added the Maximum Margin Relevance (MMR) strategy to the SentenceTransformersDiversityRanker. MMR scores are calculated for each document based on their relevance to the query and diversity from already selected documents.
  • Introduces optional parameters in the ConditionalRouter component, enabling default/fallback routing behavior when certain inputs are not provided at runtime. This enhancement allows for more flexible pipeline configurations with graceful handling of missing parameters.
  • Added split by line to DocumentSplitter, which will split the document at n.
  • Change OpenAIDocumentEmbedder to keep running if a batch fails embedding. Now OpenAI returns an error we log that error and keep processing following batches.
  • Added new initialization parameters to the PyPDFToDocument component to customize the text extraction process from PDF files.
  • Replace usage of ChatMessage.content with ChatMessage.text across the codebase. This is done in preparation for the removal of content in Haystack 2.9.0.

⚠️ Deprecation Notes

  • The default value of the store_full_path parameter in converters will change to False in Haysatck 2.9.0 to enhance privacy.
  • In Haystack 2.9.0, the ChatMessage data class will be refactored to make it more flexible and future-proof. As part of this change, the content attribute will be removed. A new text property has been introduced to provide access to the textual value of the ChatMessage. To ensure a smooth transition, start using the text property now in place of content.
  • The converter parameter in the PyPDFToDocument component is deprecated and will be removed in Haystack 2.9.0. For in-depth customization of the conversion process, consider implementing a custom component. Additional high-level customization options will be added in the future.
  • The output of context_documents in SentenceWindowRetriever will change in the next release. Instead of a List[List[Document]], the output will be a List[Document], where the documents are ordered by split_idx_start.

🐛 Bug Fixes

  • Fix DocumentCleaner not preserving all Document fields when run

  • Fix DocumentJoiner failing when ran with an empty list of Documents

  • For the NLTKDocumentSplitter we are updating how chunks are made when splitting by word and sentence boundary is respected. Namely, to avoid fully subsuming the previous chunk into the next one, we ignore the first sentence from that chunk when calculating sentence overlap. i.e. we want to avoid cases of Doc1 = [s1, s2], Doc2 = [s1, s2, s3].

  • Finished adding function support for this component by updating the _split_into_units function and added the splitting_function init parameter.

  • Add specific to_dict method to overwrite the underlying one from DocumentSplitter. This is needed to properly save the settings of the component to yaml.

  • Fix OpenAIChatGenerator and OpenAIGenerator crashing when using a streaming_callback and generation_kwargs contain {"stream_options": {"include_usage": True}}.

  • Fix tracing Pipeline with cycles to correctly track components execution

  • When meta is passed into AnswerBuilder.run(), it is now merged into GeneratedAnswer meta

  • Fix DocumentSplitter to handle custom splitting_function without requiring split_length. Previously the splitting_function provided would not override other settings.

v2.8.0-rc3

04 Dec 15:04
Compare
Choose a tag to compare
v2.8.0-rc3 Pre-release
Pre-release

Release Notes

⬆️ Upgrade Notes

  • Remove is_greedy deprecated argument from @component decorator. Change the Variadic input of your Component to GreedyVariadic instead.

🚀 New Features

  • We've added a new DALLEImageGenerator component, bringing image generation with OpenAI's DALL-E to the Haystack
    • Easy to Use: Just a few lines of code to get started:
      `python from haystack.components.generators import DALLEImageGenerator image_generator = DALLEImageGenerator() response = image_generator.run("Show me a picture of a black cat.") print(response)`
  • Add warning logs to the PDFMinerToDocument and PyPDFToDocument to indicate when a processed PDF file has no content. This can happen if the PDF file is a scanned image. Also added an explicit check and warning message to the DocumentSplitter that warns the user that empty Documents are skipped. This behavior was already occurring, but now its clearer through logs that this is happening.
  • We have added a new MetaFieldGroupingRanker component that reorders documents by grouping them based on metadata keys. This can be useful for pre-processing Documents before feeding them to an LLM.
  • Added a new store_full_path parameter to the __init__ methods of the following converters:
    JSONConverter, CSVToDocument, DOCXToDocument, HTMLToDocument MarkdownToDocument, PDFMinerToDocument, PPTXToDocument, TikaDocumentConverter, PyPDFToDocument , AzureOCRDocumentConverter and TextFileToDocument. The default value is True, which stores full file path in the metadata of the output documents. When set to False, only the file name is stored.
  • When making function calls via OpenAPI, allow both switching SSL verification off and specifying a certificate authority to use for it.
  • Add TTFT (Time-to-First-Token) support for OpenAI generators. This captures the time taken to generate the first token from the model and can be used to analyze the latency of the application.
  • Added a new option to the required_variables parameter to the PromptBuilder and ChatPromptBuilder. By passing required_variables="*" you can automatically set all variables in the prompt to be required.

⚡️ Enhancement Notes

  • Across Haystack codebase, we have replaced the use of ChatMessage data class constructor with specific class methods (ChatMessage.from_user, ChatMessage.from_assistant, etc.).
  • Added the Maximum Margin Relevance (MMR) strategy to the SentenceTransformersDiversityRanker. MMR scores are calculated for each document based on their relevance to the query and diversity from already selected documents.
  • Introduces optional parameters in the ConditionalRouter component, enabling default/fallback routing behavior when certain inputs are not provided at runtime. This enhancement allows for more flexible pipeline configurations with graceful handling of missing parameters.
  • Added split by line to DocumentSplitter, which will split the document at n.
  • Change OpenAIDocumentEmbedder to keep running if a batch fails embedding. Now OpenAI returns an error we log that error and keep processing following batches.
  • Added new initialization parameters to the PyPDFToDocument component to customize the text extraction process from PDF files.
  • Replace usage of ChatMessage.content with ChatMessage.text across the codebase. This is done in preparation for the removal of content in Haystack 2.9.0.

⚠️ Deprecation Notes

  • The default value of the store_full_path parameter in converters will change to False in Haysatck 2.9.0 to enhance privacy.
  • In Haystack 2.9.0, the ChatMessage data class will be refactored to make it more flexible and future-proof. As part of this change, the content attribute will be removed. A new text property has been introduced to provide access to the textual value of the ChatMessage. To ensure a smooth transition, start using the text property now in place of content.
  • The converter parameter in the PyPDFToDocument component is deprecated and will be removed in Haystack 2.9.0. For in-depth customization of the conversion process, consider implementing a custom component. Additional high-level customization options will be added in the future.
  • The output of context_documents will change in the next release. Instead of a List[List[Document]], the output will be a List[Document], where the documents are ordered by split_idx_start.

🐛 Bug Fixes

  • Fix DocumentCleaner not preserving all Document fields when run

  • Fix DocumentJoiner failing when ran with an empty list of Documents

  • For the NLTKDocumentSplitter we are updating how chunks are made when splitting by word and sentence boundary is respected. Namely, to avoid fully subsuming the previous chunk into the next one, we ignore the first sentence from that chunk when calculating sentence overlap. i.e. we want to avoid cases of Doc1 = [s1, s2], Doc2 = [s1, s2, s3].

  • Finished adding function support for this component by updating the _split_into_units function and added the splitting_function init parameter.

  • Add specific to_dict method to overwrite the underlying one from DocumentSplitter. This is needed to properly save the settings of the component to yaml.

  • Fix OpenAIChatGenerator and OpenAIGenerator crashing when using a streaming_callback and generation_kwargs contain {"stream_options": {"include_usage": True}}.

  • Fix tracing Pipeline with cycles to correctly track components execution

  • When meta is passed into AnswerBuilder.run(), it is now merged into GeneratedAnswer meta

  • Fix DocumentSplitter to handle custom splitting_function without requiring split_length. Previously the splitting_function provided would not override other settings.

v2.8.0-rc2

03 Dec 15:01
Compare
Choose a tag to compare
v2.8.0-rc2 Pre-release
Pre-release

Release Notes

⬆️ Upgrade Notes

  • Remove is_greedy deprecated argument from @component decorator. Change the Variadic input of your Component to GreedyVariadic instead.

🚀 New Features

  • We've added a new DALLEImageGenerator component, bringing image generation with OpenAI's DALL-E to the Haystack
    • Easy to Use: Just a few lines of code to get started:
      `python from haystack.components.generators import DALLEImageGenerator image_generator = DALLEImageGenerator() response = image_generator.run("Show me a picture of a black cat.") print(response)`
  • Add warning logs to the PDFMinerToDocument and PyPDFToDocument to indicate when a processed PDF file has no content. This can happen if the PDF file is a scanned image. Also added an explicit check and warning message to the DocumentSplitter that warns the user that empty Documents are skipped. This behavior was already occurring, but now its clearer through logs that this is happening.
  • We have added a new MetaFieldGroupingRanker component that reorders documents by grouping them based on metadata keys. This can be useful for pre-processing Documents before feeding them to an LLM.
  • Added a new store_full_path parameter to the __init__ methods of the following converters:
    JSONConverter, CSVToDocument, DOCXToDocument, HTMLToDocument MarkdownToDocument, PDFMinerToDocument, PPTXToDocument, TikaDocumentConverter, PyPDFToDocument , AzureOCRDocumentConverter and TextFileToDocument. The default value is True, which stores full file path in the metadata of the output documents. When set to False, only the file name is stored.
  • When making function calls via OpenAPI, allow both switching SSL verification off and specifying a certificate authority to use for it.
  • Add TTFT (Time-to-First-Token) support for OpenAI generators. This captures the time taken to generate the first token from the model and can be used to analyze the latency of the application.
  • Added a new option to the required_variables parameter to the PromptBuilder and ChatPromptBuilder. By passing required_variables="*" you can automatically set all variables in the prompt to be required.

⚡️ Enhancement Notes

  • Across Haystack codebase, we have replaced the use of ChatMessage data class constructor with specific class methods (ChatMessage.from_user, ChatMessage.from_assistant, etc.).
  • Added the Maximum Margin Relevance (MMR) strategy to the SentenceTransformersDiversityRanker. MMR scores are calculated for each document based on their relevance to the query and diversity from already selected documents.
  • Introduces optional parameters in the ConditionalRouter component, enabling default/fallback routing behavior when certain inputs are not provided at runtime. This enhancement allows for more flexible pipeline configurations with graceful handling of missing parameters.
  • Added split by line to DocumentSplitter, which will split the document at n
  • Change OpenAIDocumentEmbedder to keep running if a batch fails embedding. Now OpenAI returns an error we log that error and keep processing following batches.
  • Added new initialization parameters to the PyPDFToDocument component to customize the text extraction process from PDF files.
  • Replace usage of ChatMessage.content with ChatMessage.text across the codebase. This is done in preparation for the removal of content in Haystack 2.9.0.

⚠️ Deprecation Notes

  • The default value of the store_full_path parameter will change to False in Haysatck 2.9.0 to enhance privacy.
  • The default value of the store_full_path parameter in converters will change to False in Haysatck 2.9.0 to enhance privacy.
  • In Haystack 2.9.0, the ChatMessage data class will be refactored to make it more flexible and future-proof. As part of this change, the content attribute will be removed. A new text property has been introduced to provide access to the textual value of the ChatMessage. To ensure a smooth transition, start using the text property now in place of content.
  • The converter parameter in the PyPDFToDocument component is deprecated and will be removed in Haystack 2.9.0. For in-depth customization of the conversion process, consider implementing a custom component. Additional high-level customization options will be added in the future.

🐛 Bug Fixes

  • Fix DocumentCleaner not preserving all Document fields when run

  • Fix DocumentJoiner failing when ran with an empty list of Documents

  • For the NLTKDocumentSplitter we are updating how chunks are made when splitting by word and sentence boundary is respected. Namely, to avoid fully subsuming the previous chunk into the next one, we ignore the first sentence from that chunk when calculating sentence overlap. i.e. we want to avoid cases of Doc1 = [s1, s2], Doc2 = [s1, s2, s3].

  • Finished adding function support for this component by updating the _split_into_units function and added the splitting_function init parameter.

  • Add specific to_dict method to overwrite the underlying one from DocumentSplitter. This is needed to properly save the settings of the component to yaml.

  • Fix OpenAIChatGenerator and OpenAIGenerator crashing when using a streaming_callback and generation_kwargs contain {"stream_options": {"include_usage": True}}.

  • Fix tracing Pipeline with cycles to correctly track components execution

  • When meta is passed into AnswerBuilder.run(), it is now merged into GeneratedAnswer meta

  • Fix DocumentSplitter to handle custom splitting_function without requiring split_length. Previously the splitting_function provided would not override other settings.

v2.8.0-rc1

26 Nov 12:00
Compare
Choose a tag to compare
v2.8.0-rc1 Pre-release
Pre-release

Release Notes

⬆️ Upgrade Notes

  • Remove is_greedy deprecated argument from @component decorator. Change the Variadic input of your Component to GreedyVariadic instead.

🚀 New Features

  • We've added a new DALLEImageGenerator component, bringing image generation with OpenAI's DALL-E to the Haystack
    • Easy to Use: Just a few lines of code to get started:
      `python from haystack.components.generators import DALLEImageGenerator image_generator = DALLEImageGenerator() response = image_generator.run("Show me a picture of a black cat.") print(response)`
  • Add warning logs to the PDFMinerToDocument and PyPDFToDocument to indicate when a processed PDF file has no content. This can happen if the PDF file is a scanned image. Also added an explicit check and warning message to the DocumentSplitter that warns the user that empty Documents are skipped. This behavior was already occurring, but now its clearer through logs that this is happening.
  • We have added a new MetaFieldGroupingRanker component that reorders documents by grouping them based on metadata keys. This can be useful for pre-processing Documents before feeding them to an LLM.
  • Added a new store_full_path parameter to the __init__ methods of the following converters:
    JSONConverter, CSVToDocument, DOCXToDocument, HTMLToDocument MarkdownToDocument, PDFMinerToDocument, PPTXToDocument, TikaDocumentConverter and TextFileToDocument. The default value is True, which stores full file path in the metadata of the output documents. When set to False, only the file name is stored.
  • When making function calls via OpenAPI, allow both switching SSL verification off and specifying a certificate authority to use for it.
  • Add TTFT (Time-to-First-Token) support for OpenAI generators. This captures the time taken to generate the first token from the model and can be used to analyze the latency of the application.
  • Added a new option to the required_variables parameter to the PromptBuilder and ChatPromptBuilder. By passing required_variables="*" you can automatically set all variables in the prompt to be required.

⚡️ Enhancement Notes

  • Added the Maximum Margin Relevance (MMR) strategy to the SentenceTransformersDiversityRanker. MMR scores are calculated for each document based on their relevance to the query and diversity from already selected documents.
  • Introduces optional parameters in the ConditionalRouter component, enabling default/fallback routing behavior when certain inputs are not provided at runtime. This enhancement allows for more flexible pipeline configurations with graceful handling of missing parameters.
  • Added split by line to DocumentSplitter, which will split the document at n
  • Change OpenAIDocumentEmbedder to keep running if a batch fails embedding. Now OpenAI returns an error we log that error and keep processing following batches.

⚠️ Deprecation Notes

  • The default value of the store_full_path parameter will change to False in Haysatck 2.9.0 to enhance privacy.

🐛 Bug Fixes

  • Fix DocumentCleaner not preserving all Document fields when run

  • Fix DocumentJoiner failing when ran with an empty list of Documents

  • For the NLTKDocumentSplitter we are updating how chunks are made when splitting by word and sentence boundary is respected. Namely, to avoid fully subsuming the previous chunk into the next one, we ignore the first sentence from that chunk when calculating sentence overlap. i.e. we want to avoid cases of Doc1 = [s1, s2], Doc2 = [s1, s2, s3].

  • Finished adding function support for this component by updating the _split_into_units function and added the splitting_function init parameter.

  • Add specific to_dict method to overwrite the underlying one from DocumentSplitter. This is needed to properly save the settings of the component to yaml.

  • Fix OpenAIChatGenerator and OpenAIGenerator crashing when using a streaming_callback and generation_kwargs contain {"stream_options": {"include_usage": True}}.

  • Fix tracing Pipeline with cycles to correctly track components execution

  • When meta is passed into AnswerBuilder.run(), it is now merged into GeneratedAnswer meta

  • Fix DocumentSplitter to handle custom splitting_function without requiring split_length. Previously the splitting_function provided would not override other settings.

v2.7.0

11 Nov 10:41
Compare
Choose a tag to compare

Release Notes

✨ Highlights

🚅 Rework Pipeline.run() logic to better handle cycles

Pipeline.run() internal logic has been heavily reworked to be more robust and reliable than before. This new implementation makes it easier to run Pipelines that have cycles in their graph. It also fixes some corner cases in Pipelines that don't have any cycle.

📝 Introduce LoggingTracer

With the new LoggingTracer, users can inspect the logs in real-time to see everything that is happening in their Pipelines. This feature aims to improve the user experience during experimentation and prototyping.

import logging
from haystack import tracing
from haystack.tracing.logging_tracer import LoggingTracer

logging.basicConfig(format="%(levelname)s - %(name)s -  %(message)s", level=logging.WARNING)
logging.getLogger("haystack").setLevel(logging.DEBUG)
tracing.tracer.is_content_tracing_enabled = True # to enable tracing/logging content (inputs/outputs)
tracing.enable_tracing(LoggingTracer())

image

⬆️ Upgrade Notes

  • Removed Pipeline init argument debug_path. We do not support this anymore.

  • Removed Pipeline init argument max_loops_allowed. Use max_runs_per_component instead.

  • Removed PipelineMaxLoops exception. Use PipelineMaxComponentRuns instead.

  • The deprecated default converter class haystack.components.converters.pypdf.DefaultConverter used by PyPDFToDocument has been removed.

    Pipeline YAMLs from haystack<2.7.0 that use the default converter must be updated in the following manner:

    # Old
    components:
        Comp1:
        init_parameters:
            converter:
            type: haystack.components.converters.pypdf.DefaultConverter
        type: haystack.components.converters.pypdf.PyPDFToDocument
    
    # New
    components:
        Comp1:
        init_parameters:
            converter: null
        type: haystack.components.converters.pdf.PDFToTextConverter

    Pipeline YAMLs from haystack<2.7.0 that use custom converter classes can be upgraded by simply loading them with haystack==2.6.x and saving them to YAML again.

  • Pipeline.connect() will now raise a PipelineConnectError if sender and receiver are the same Component. We do not support this use case anymore.

🚀 New Features

  • Added component StringJoiner to join strings from different components to a list of strings.

  • Improved serialization/deserialization errors to provide extra context about the delinquent components when possible.

  • Enhanced DOCX converter to support table extraction in addition to paragraph content. The converter supports both CSV and Markdown table formats, providing flexible options for representing tabular data extracted from DOCX documents.

  • Added a new parameter additional_mimetypes to the FileTypeRouter component. This allows users to specify additional MIME type mappings, ensuring correct file classification across different runtime environments and Python versions.

  • Introduce a LoggingTracer, that sends all traces to the logs.

    It can enabled as follows:

    import logging
    from haystack import tracing
    from haystack.tracing.logging_tracer import LoggingTracer
    
    logging.basicConfig(format="%(levelname)s - %(name)s -  %(message)s", level=logging.WARNING)
    logging.getLogger("haystack").setLevel(logging.DEBUG)
    tracing.tracer.is_content_tracing_enabled = True # to enable tracing/logging content (inputs/outputs)
    tracing.enable_tracing(LoggingTracer())
  • Fundamentally rework the internal logic of Pipeline.run(). The rework makes it more reliable and covers more use cases. We fixed some issues that made Pipelines with cycles unpredictable and with unclear Components execution order.

  • Each tracing span of a component run is now attached with the pipeline run span object. This allows users to trace the execution of multiple pipeline runs concurrently.

⚡️ Enhancement Notes

  • Add streaming_callback run parameter to HuggingFaceAPIGenerator and HuggingFaceLocalGenerator to allow users to pass a callback function that will be called after each chunk of the response is generated.
  • The SentenceWindowRetriever now supports the window_size parameter at run time, overwriting the value set in the constructor.
  • Add output type validation in ConditionalRouter. Setting validate_output_type to True will enable a check to verify if the actual output of a route returns the declared type. If it doesn't match a ValueError is raised.
  • Reduced numpy usage to speed up imports.
  • Improved file type detection in FileTypeRouter, particularly for Microsoft Office file formats like .docx and .pptx. This enhancement ensures more consistent behavior across different environments, including AWS Lambda functions and systems without pre-installed office suites.
  • The FiletypeRouter now supports passing metadata (meta) in the run method. When metadata is provided, the sources are internally converted to ByteStream objects and the metadata is added. This new parameter simplifies working with preprocessing/indexing pipelines.
  • SentenceTransformersDocumentEmbedder now supports config_kwargs for additional parameters when loading the model configuration
  • SentenceTransformersTextEmbedder now supports config_kwargs for additional parameters when loading the model configuration
  • Previously, numpy was pinned to <2.0 to avoid compatibility issues in several core integrations. This pin has been removed, and haystack can work with both numpy 1.x and 2.x. If necessary, we will pin numpy version in specific core integrations that require it.

⚠️ Deprecation Notes

  • The DefaultConverter class used by the PyPDFToDocument component has been deprecated. Its functionality will be merged into the component in 2.7.0.

🐛 Bug Fixes

  • Serialized data of components are now explicitly enforced to be one of the following basic Python datatypes: str, int, float, bool, list, dict, set, tuple or None.
  • Addressed an issue where certain file types (e.g., .docx, .pptx) were incorrectly classified as 'unclassified' in environments with limited MIME type definitions, such as AWS Lambda functions.
  • Fixes logs containing JSON data getting lost due to string interpolation.
  • Use forward references for Hugging Face Hub types in the HuggingFaceAPIGenerator component to prevent import errors.
  • Fix the serialization of PyPDFToDocument component to prevent the default converter from being serialized unnecessarily.
  • Revert change to PyPDFConverter that broke the deserialization of pre 2.6.0 YAMLs.

v1.26.4

11 Nov 15:15
Compare
Choose a tag to compare

Release Notes

v1.26.4

⚡️ Enhancement Notes

  • Upgrade the transformers dependency requirement to transformers>=4.46,<5.0
  • Updated tokenizer.json URL for Anthropic models as the old URL was no longer available.

v2.7.0-rc1

30 Oct 15:05
Compare
Choose a tag to compare
v2.7.0-rc1 Pre-release
Pre-release

Release Notes

✨ Highlights

🚅 Rework Pipeline.run() logic to better handle cycles

Pipeline.run() internal logic has been heavily reworked to be more robust and reliable than before. This new implementation makes it easier to run Pipelines that have cycles in their graph. It also fixes some corner cases in Pipelines that don't have any cycle.

📝 Introduce LoggingTracer

With the new LoggingTracer, users can inspect in the logs everything that is happening in their Pipelines in real time. This feature aims to improve the user experience during experimentation and prototyping.

⬆️ Upgrade Notes

  • Removed Pipeline init argument debug_path. We do not support this anymore.

  • Removed Pipeline init argument max_loops_allowed. Use max_runs_per_component instead.

  • Removed PipelineMaxLoops exception. Use PipelineMaxComponentRuns instead.

  • The deprecated default converter class haystack.components.converters.pypdf.DefaultConverter used by PyPDFToDocument has been removed.

    Pipeline YAMLs from haystack<2.7.0 that use the default converter must be updated in the following manner:

    # Old
    components:
        Comp1:
        init_parameters:
            converter:
            type: haystack.components.converters.pypdf.DefaultConverter
        type: haystack.components.converters.pypdf.PyPDFToDocument
    
    # New
    components:
        Comp1:
        init_parameters:
            converter: null
        type: haystack.components.converters.pdf.PDFToTextConverter

    Pipeline YAMLs from haystack<2.7.0 that use custom converter classes can be upgraded by simply loading them with haystack==2.6.x and saving them to YAML again.

  • Pipeline.connect() will now raise a PipelineConnectError if sender and receiver are the same Component. We do not support this use case anymore.

🚀 New Features

  • Added component StringJoiner to join strings from different components to a list of strings.

  • Improved serialization/deserialization errors to provide extra context about the delinquent components when possible.

  • Enhanced DOCX converter to support table extraction in addition to paragraph content. The converter supports both CSV and Markdown table formats, providing flexible options for representing tabular data extracted from DOCX documents.

  • Added a new parameter additional_mimetypes to the FileTypeRouter component.

    This allows users to specify additional MIME type mappings, ensuring correct

    file classification across different runtime environments and Python versions.

  • Introduce a LoggingTracer, that sends all traces to the logs.

    It can enabled as follows:

    import logging
    from haystack import tracing
    from haystack.tracing.logging_tracer import LoggingTracer
    
    logging.basicConfig(format="%(levelname)s - %(name)s -  %(message)s", level=logging.WARNING)
    logging.getLogger("haystack").setLevel(logging.DEBUG)
    tracing.tracer.is_content_tracing_enabled = True # to enable tracing/logging content (inputs/outputs)
    tracing.enable_tracing(LoggingTracer())
  • Fundamentally rework the internal logic of Pipeline.run(). The rework makes it more reliable and covers more use cases. We fixed some issues that made Pipelines with cycles unpredictable and with unclear Components execution order.

  • Each tracing span of a component run is now attached with the pipeline run span object. This allows users to trace the execution of multiple pipeline runs concurrently.

⚡️ Enhancement Notes

  • Add streaming_callback run parameter to HuggingFaceAPIGenerator and HuggingFaceLocalGenerator to allow users to pass a callback function that will be called after each chunk of the response is generated.
  • The SentenceWindowRetriever now supports the window_size parameter at run time, overwriting the value set in the constructor.
  • Add output type validation in ConditionalRouter. Setting validate_output_type to True will enable a check to verify if the actual output of a route returns the declared type. If it doesn't match a ValueError is raised.
  • Reduced numpy usage to speed up imports.
  • Improved file type detection in FileTypeRouter, particularly for Microsoft Office file formats like .docx and .pptx. This enhancement ensures more consistent behavior across different environments, including AWS Lambda functions and systems without pre-installed office suites.
  • The FiletypeRouter now supports passing metadata (meta) in the run method. When metadata is provided, the sources are internally converted to ByteStream objects and the metadata is added. This new parameter simplifies working with preprocessing/indexing pipelines.
  • SentenceTransformersDocumentEmbedder now supports config_kwargs for additional parameters when loading the model configuration
  • SentenceTransformersTextEmbedder now supports config_kwargs for additional parameters when loading the model configuration
  • Previously, numpy was pinned to <2.0 to avoid compatibility issues in several core integrations. This pin has been removed, and haystack can work with both numpy 1.x and 2.x. If necessary, we will pin numpy version in specific core integrations that require it.

⚠️ Deprecation Notes

  • The DefaultConverter class used by the PyPDFToDocument component has been deprecated. Its functionality will be merged into the component in 2.7.0.

🐛 Bug Fixes

  • Serialized data of components are now explicitly enforced to be one of the following basic Python datatypes: str, int, float, bool, list, dict, set, tuple or None.
  • Addressed an issue where certain file types (e.g., .docx, .pptx) were incorrectly classified as 'unclassified' in environments with limited MIME type definitions, such as AWS Lambda functions.
  • Fixes logs containing JSON data getting lost due to string interpolation.
  • Use forward references for Hugging Face Hub types in the HuggingFaceAPIGenerator component to prevent import errors.
  • Fix the serialization of PyPDFToDocument component to prevent the default converter from being serialized unnecessarily.
  • Revert change to PyPDFConverter that broke the deserialization of pre 2.6.0 YAMLs.

v2.6.1

10 Oct 08:53
Compare
Choose a tag to compare

Release Notes

v2.6.1

Bug Fixes

  • Revert change to PyPDFConverter that broke the deserialization of pre 2.6.0 YAMLs.