Skip to content

Commit

Permalink
Version 3.27.0rc0
Browse files Browse the repository at this point in the history
  • Loading branch information
mborsetti committed Oct 29, 2024
1 parent e95c7ba commit e2c3600
Show file tree
Hide file tree
Showing 5 changed files with 35 additions and 8 deletions.
8 changes: 7 additions & 1 deletion .github/workflows/ci-cd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -145,10 +145,16 @@ jobs:
sudo apt-get -y install python3-dbus build-essential libpoppler-cpp-dev pkg-config python3-dev tesseract-ocr
pip install --upgrade pdftotext pytesseract
- name: Install all other dependencies
- name: Install all other dependencies (GIL)
if: "${{ !matrix.disable-gil }}"
run: |
pip install --upgrade coveralls -r requirements.txt -r tests/requirements_pytest.txt
- name: Install all other dependencies (free-threaded)
if: "${{ matrix.disable-gil }}"
run: |
pip install --upgrade coveralls -r requirements-free-threaded.txt -r tests/requirements_pytest.txt
# - name: Install all other dependencies (py12)
# if: matrix.python-version == '3.12'
# run: |
Expand Down
6 changes: 4 additions & 2 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,11 +39,13 @@ Unreleased

Added
-----
* New Sub-directive in ``pypdf`` Filter: Added ``extraction_mode`` sub-directive.
* Python 3.13 Testing: **webchanges** now is tested on Python 3.13 before releasing. However, the `aioxmpp
* Python 3.13: **webchanges** now is tested on Python 3.13 before releasing. However, the `aioxmpp
<https://pypi.org/project/aioxmpp/>`__ library required by the ``xmpp`` reporter will not install in Python 3.13 (at
least on Windows), and the development of the `library <https://codeberg.org/jssfr/aioxmpp>`__ has been
halted.
* Python 3.13t (free-threaded, GIL-free) remains unsupported due to lack of support by dependencies such
as ``lxml``.
* New Sub-directive in ``pypdf`` Filter: Added ``extraction_mode`` sub-directive.

Internals
---------
Expand Down
10 changes: 10 additions & 0 deletions requirements-free-threaded.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
colorama; sys_platform == "win32"
cssselect
h2
html2text
httpx
markdown2
msgpack
platformdirs
pyyaml
tzdata; sys_platform == "win32"
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ cssselect
h2
html2text
httpx
lxml>=5.3.0 # prior versions don't build in 3.13
lxml >= 5.3.0 # prior versions don't build in 3.13
markdown2
msgpack
platformdirs
Expand Down
17 changes: 13 additions & 4 deletions webchanges/filters.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import csv
import hashlib
import html
import importlib.util
import io
import itertools
import logging
Expand All @@ -25,8 +26,14 @@

import html2text
import yaml
from lxml import etree # noqa: S410 insecure use of XML modules, prefer "defusedxml". TODO
from lxml.cssselect import CSSSelector # noqa: S410 insecure use of XML ... "defusedxml". TODO

try:
from lxml import etree # noqa: S410 insecure use of XML modules, prefer "defusedxml". TODO
from lxml.cssselect import CSSSelector # noqa: S410 insecure use of XML ... "defusedxml". TODO
except ImportError as e:
from xml import etree

CSSSelector = str(e)

from webchanges import __project_name__
from webchanges.util import TrackSubClasses
Expand Down Expand Up @@ -412,7 +419,8 @@ def filter(self, data: str | bytes, mime_type: str, subfilter: dict[str, Any]) -
if isinstance(bs4, str):
self.raise_import_error('BeautifulSoup', self.__kind__, bs4)

soup = bs4.BeautifulSoup(data, features='lxml')
bs4_features = 'lxml' if importlib.util.find_spec('lxml') is not None else 'html'
soup = bs4.BeautifulSoup(data, features=bs4_features)

if isinstance(jsbeautifier, str):
logger.warning(
Expand Down Expand Up @@ -551,7 +559,8 @@ def filter(self, data: str | bytes, mime_type: str, subfilter: dict[str, Any]) -
if isinstance(bs4, str):
self.raise_import_error('BeautifulSoup', self.__kind__, bs4)

bs4_parser: str = options.pop('parser', 'lxml')
default_bs4_parser = 'lxml' if importlib.util.find_spec('lxml') is not None else 'html'
bs4_parser: str = options.pop('parser', default_bs4_parser)
try:
soup = bs4.BeautifulSoup(data, bs4_parser)
except bs4.FeatureNotFound:
Expand Down

0 comments on commit e2c3600

Please sign in to comment.