v2.0.0-beta.3
Pre-releaseRelease Notes
v2.0.0-beta.3
⬆️ Upgrade Notes
-
If you are using AzureOCRDocumentConverter or TikaDocumentConverter, you need to change paths to sources in the run method.
An example:
`python from haystack.components.converters import TikaDocumentConverter converter = TikaDocumentConverter() converter.run(paths=["paths/to/file1.pdf", "path/to/file2.pdf"])
`The last line should be changed to:
`python converter.run(sources=["paths/to/file1.pdf", "path/to/file2.pdf"])
`
⚡️ Enhancement Notes
-
Adds markdown mimetype support to the file type router i.e. FileTypeRouter class.
-
Refactor Answer dataclass and classes that inherited it. Now Answer is a Protocol, classes that used to inherit it now respect that interface. We also added a new ExtractiveTableAnswer to be used for table question answering.
All classes now are easily serializable using to_dict() and from_dict() like Document and components.
-
Make all Converters accept meta in the run method, so that users can provide their own metadata. The length of this list should match the number of sources.
-
Make all the Converters accept the sources parameter in the run method. sources is a list that can contain str, Path or ByteStream objects.
-
Renamed the confidence_threshold parameter of the ExtractiveReader to score_threshold as ExtractedAnswers have a score and this is what the threshold is for. For consistency, the term confidence is not mentioned anymore in favor of score.
-
Include 'boilerpy3' in the 'haystack-ai' dependencies.
Known Issues
- Make connect idempotent, allowing connecting the same components more than once. Specially useful in Jupiter notebooks. Fixes #6359.
- Fix "TypeError: descriptor '__dict__' for 'XXX' objects doesn't apply to a 'XXX' object" when running pipelines with debug=True by removing the graph image from the debug payload.
🐛 Bug Fixes
- Make TransformersSimilarityRanker run with a list containing a single document as input.