Skip to content

Commit

Permalink
Version 3.27.0b1
Browse files Browse the repository at this point in the history
  • Loading branch information
mborsetti committed Nov 23, 2024
1 parent 4d21530 commit f01b39e
Show file tree
Hide file tree
Showing 22 changed files with 351 additions and 152 deletions.
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,7 @@ repos:
- mypy
- types-backports
- types-docutils
- types-python-dateutil
- types-PyYAML
- types-redis
- types-requests
Expand Down
29 changes: 26 additions & 3 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,19 +33,42 @@ can check out the `wish list <https://github.com/mborsetti/webchanges/blob/main/
Internals, for changes that don't affect users. [triggers a minor patch]
Version 3.27.0rc0
Version 3.27.0b1
==================
Unreleased

⚠ Breaking Changes
------------------
* Error notifications for failed jobs will now only be sent when an error is first encountered. Additional
notifications for the same error will not be sent unless the error resolves or a different error occurs. To restore
the previous behavior of receiving repeated notifications for the same error, add or modify the ``repeated_error``
setting under the ``display`` key in your config file:

.. code-block:: yaml
display:
_note: this is a note
new: false
error: true
repeated_error: true # defaults to false
unchanged: false
empty-diff: false
This enhancement was requested by `toxin-x <https://github.com/toxin-x>`__ in issue `#86
<https://github.com/mborsetti/webchanges/issues/86>`__.

Added
-----
* Python 3.13: **webchanges** now is tested on Python 3.13 before releasing. However, the `aioxmpp
<https://pypi.org/project/aioxmpp/>`__ library required by the ``xmpp`` reporter will not install in Python 3.13 (at
least on Windows), and the development of the `library <https://codeberg.org/jssfr/aioxmpp>`__ has been
halted.
* Python 3.13t (free-threaded, GIL-free) remains unsupported due to lack of support by dependencies such
as ``lxml``.

- Python 3.13t (free-threaded, GIL-free) remains unsupported due to the lack of free-threaded wheels of dependencies
such as ``cryptography``, ``msgpack``, ``lxml``, and the optional ``jq``.
* New Sub-directive in ``pypdf`` Filter: Added ``extraction_mode`` sub-directive.
* Now storing error information in snapshot database.

Internals
---------
Expand Down
6 changes: 5 additions & 1 deletion RELEASE.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
Added
-----
* Python 3.13 Testing: **webchanges** now is tested on Python 3.13 before releasing. However, the `aioxmpp
* Python 3.13: **webchanges** now is tested on Python 3.13 before releasing. However, the `aioxmpp
<https://pypi.org/project/aioxmpp/>`__ library required by the ``xmpp`` reporter will not install in Python 3.13 (at
least on Windows), and the development of the `library <https://codeberg.org/jssfr/aioxmpp>`__ has been
halted.

- Python 3.13t (free-threaded, GIL-free) remains unsupported due to the lack of free-threaded wheels of dependencies
such as ``cryptography``, ``msgpack``, ``lxml``, and the optional ``jq``.
* New Sub-directive in ``pypdf`` Filter: Added ``extraction_mode`` sub-directive.

Internals
---------
* Added ``ai_google`` directive to the ``image`` differ to test Generative AI summarization of differences between two
Expand Down
9 changes: 8 additions & 1 deletion docs/differs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,11 @@ directive to specify another `model <https://ai.google.dev/models/gemini>`__, su
Pro <https://ai.google.dev/gemini-api/docs/models/gemini#gemini-1.5-pro-expandable>`__ with a context window of 2
million tokens (``gemini-1.5-pro-latest``) or the older Gemini 1.0 Pro (``gemini-1.0-pro-latest``).

The full list of production models available is `here <https://ai.google.dev/gemini-api/docs/models/gemini>`__, and
additional experimental models (if any) are listed `here
<https://ai.google.dev/gemini-api/docs/models/experimental-models>`__. You can manually evaluate responses side-by-side
across the various models `here <https://aistudio.google.com/app/prompts/new_comparison>`__.

.. note:: These models work with `38 languages
<https://ai.google.dev/gemini-api/docs/models/gemini#available-languages>`__ and are available in over `200 regions
<https://ai.google.dev/gemini-api/docs/available-regions>`__.
Expand Down Expand Up @@ -307,7 +312,9 @@ This differ is currently in BETA and these directives MAY change in the future.
* ``top_p`` (float between 0.0 and 1.0): The model's TopP parameter, or the cumulative probability cutoff for token
selection; lower p means sampling from a smaller, more top-weighted nucleus and reduces diversity (see note below)
(default: model-dependent, but typically 0.95 or 1.0, see Google documentation)
* ``unified`` (dict): directives passed to :ref:`unified differ <unified_diff>`, which prepares the unified diff
* ``tools`` (list): Data passed on to the API's 'tool' field, for example to ground the response (see `here
<https://ai.google.dev/api/caching#Tool>`__.
* ``unified`` (dict): Directives passed to :ref:`unified differ <unified_diff>`, which prepares the unified diff
attached to this report.

Directives for the underlying :ref:`unified differ <unified_diff>` can be passed in as key ``unified``, as follows:
Expand Down
6 changes: 5 additions & 1 deletion docs/hooks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ Example ``hooks.py`` file:
"""Example hooks file for webchanges (for Python >= 3.12)."""
import re
import threading
from pathlib import Path
from typing import Any, Literal
Expand All @@ -49,6 +50,8 @@ Example ``hooks.py`` file:
from webchanges.jobs import UrlJob, UrlJobBase
from webchanges.reporters import HtmlReporter, TextReporter
hooks_custom_login_lock = threading.Lock()
class CustomLoginJob(UrlJob):
"""Custom login for my webpage.
Expand All @@ -62,7 +65,8 @@ Example ``hooks.py`` file:
def retrieve(self, job_state: JobState, headless: bool = True) -> tuple[bytes | str, str, str]:
""":returns: The data retrieved, the ETag, and the mime_type (e.g. HTTP Content-Type)."""
... # custom code here to actually do the login.
with hooks_custom_login_lock: # this site doesn't like parallel logins
... # custom code here to actually do the login.
additional_headers = {'x-special': 'test'}
self.headers.update(additional_headers) # self.headers always an httpx.Headers object
return super().retrieve(job_state) # uses the existing code to then browse and capture data
Expand Down
15 changes: 8 additions & 7 deletions tests/test_command.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@

# Set up dummy editor
editor = os.getenv('EDITOR')
if os.name == 'nt':
if sys.platform == 'win32':
os.environ['EDITOR'] = 'rundll32'
else:
os.environ['EDITOR'] = 'true'
Expand Down Expand Up @@ -523,7 +523,7 @@ def test_test_differ_and_joblist(capsys: pytest.CaptureFixture[str]) -> None:
jobs_storage = YamlJobsStorage([jobs_file])
command_config = new_command_config(jobs_file=jobs_file)
urlwatcher = Urlwatch(command_config, config_storage, snapshot_storage, jobs_storage) # main.py
if os.name == 'nt':
if sys.platform == 'win32':
urlwatcher.jobs[0].command = 'echo %time% %random%'
guid = urlwatcher.jobs[0].get_guid()

Expand Down Expand Up @@ -757,7 +757,7 @@ def test_delete_snapshot(capsys: pytest.CaptureFixture[str]) -> None:
jobs_storage = YamlJobsStorage([jobs_file])
command_config = new_command_config(jobs_file=jobs_file)
urlwatcher = Urlwatch(command_config, config_storage, snapshot_storage, jobs_storage) # main.py
if os.name == 'nt':
if sys.platform == 'win32':
urlwatcher.jobs[0].command = 'echo %time% %random%'

setattr(command_config, 'delete_snapshot', True)
Expand Down Expand Up @@ -816,7 +816,7 @@ def test_gc_database(capsys: pytest.CaptureFixture[str]) -> None:
jobs_storage = YamlJobsStorage([jobs_file])
command_config = new_command_config(jobs_file=jobs_file)
urlwatcher = Urlwatch(command_config, config_storage, snapshot_storage, jobs_storage) # main.py
if os.name == 'nt':
if sys.platform == 'win32':
urlwatcher.jobs[0].command = 'echo %time% %random%'
guid = urlwatcher.jobs[0].get_guid()

Expand All @@ -839,7 +839,7 @@ def test_gc_database(capsys: pytest.CaptureFixture[str]) -> None:
setattr(command_config, 'gc_database', False)
assert pytest_wrapped_e.value.code == 0
message = capsys.readouterr().out
if os.name == 'nt':
if sys.platform == 'win32':
assert message == f'Deleting job {guid} (no longer being tracked)\n'
else:
# TODO: for some reason, Linux message is ''. Need to figure out why.
Expand Down Expand Up @@ -1170,6 +1170,7 @@ def test_job_states_verb_notimestamp_unchanged() -> None:
tries=1,
etag=snapshot.etag,
mime_type=snapshot.mime_type,
error_data=snapshot.error_data,
),
)
ssdb_storage._copy_temp_to_permanent(delete=True)
Expand Down Expand Up @@ -1217,7 +1218,7 @@ def test_job_states_verb_notimestamp_changed() -> None:

# modify database to no timestamp
urlwatcher.ssdb_storage.delete(guid)
new_snapshot = Snapshot(snapshot.data, 0, snapshot.tries, snapshot.etag, snapshot.mime_type)
new_snapshot = Snapshot(snapshot.data, 0, snapshot.tries, snapshot.etag, snapshot.mime_type, snapshot.error_data)
urlwatcher.ssdb_storage.save(guid=guid, snapshot=new_snapshot)
ssdb_storage._copy_temp_to_permanent(delete=True)
# run again
Expand All @@ -1227,7 +1228,7 @@ def test_job_states_verb_notimestamp_changed() -> None:

# modify database to no timestamp and 1 try
urlwatcher.ssdb_storage.delete(guid)
new_snapshot = Snapshot(snapshot.data, 0, 1, snapshot.etag, snapshot.mime_type)
new_snapshot = Snapshot(snapshot.data, 0, 1, snapshot.etag, snapshot.mime_type, {})
urlwatcher.ssdb_storage.save(guid=guid, snapshot=new_snapshot)
ssdb_storage._copy_temp_to_permanent(delete=True)
# run again
Expand Down
10 changes: 5 additions & 5 deletions tests/test_differs.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
# py_nt_only = cast(
# Callable[[Callable], Callable],
# pytest.mark.skipif(
# os.name == 'nt',
# sys.platform == 'win32',
# reason='Not working on Linux',
# ),
# )
Expand Down Expand Up @@ -452,7 +452,7 @@ def test_command_change(job_state: JobState) -> None:
"""
job_state.old_data = 'a\n'
job_state.new_data = 'b\n'
if os.name == 'nt':
if sys.platform == 'win32':
command = 'cmd /C exit 1 & rem'
else:
command = 'bash -c " echo \'This is a custom diff\'; exit 1" #'
Expand Down Expand Up @@ -486,7 +486,7 @@ def test_command_error(job_state: JobState) -> None:
"""
job_state.old_data = 'a\n'
job_state.new_data = 'b\n'
if os.name == 'nt':
if sys.platform == 'win32':
command = 'cmd /C exit 2 & rem'
else:
command = 'bash -c " echo \'This is a custom diff\'; exit 2" #'
Expand All @@ -505,7 +505,7 @@ def test_command_bad_command(job_state: JobState) -> None:
job_state.job.differ = {'name': 'command', 'command': 'dfgfdgsdfg'}
job_state.get_diff()
assert isinstance(job_state.exception, FileNotFoundError)
if os.name == 'nt':
if sys.platform == 'win32':
assert str(job_state.exception) == '[WinError 2] The system cannot find the file specified'


Expand All @@ -516,7 +516,7 @@ def test_command_command_error(job_state: JobState) -> None:
job_state.job.differ = {'name': 'command', 'command': 'dir /x'}
job_state.get_diff()
assert isinstance(job_state.exception, (RuntimeError, FileNotFoundError))
if os.name == 'nt':
if sys.platform == 'win32':
assert str(job_state.exception) == (
"Job 0: External differ '{'command': 'dir /x'}' returned 'dir: cannot access "
"'/x': No such file or directory' ()"
Expand Down
4 changes: 2 additions & 2 deletions tests/test_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,7 @@ def test_number_of_tries_in_cache_is_increased_sqlite3() -> None:
snapshot = ssdb_storage.load(guid)

assert snapshot.tries == 2
assert urlwatcher.report.job_states[-1].verb == 'error'
assert urlwatcher.report.job_states[-1].verb == 'repeated_error'
finally:
ssdb_storage.delete_all()

Expand All @@ -246,7 +246,7 @@ def test_report_error_when_out_of_tries_sqlite3() -> None:
ssdb_storage._copy_temp_to_permanent(delete=True)

report = urlwatcher.report
assert report.job_states[-1].verb == 'error'
assert report.job_states[-1].verb == 'repeated_error'
finally:
ssdb_storage.delete_all()

Expand Down
2 changes: 1 addition & 1 deletion tests/test_jobs.py
Original file line number Diff line number Diff line change
Expand Up @@ -376,7 +376,7 @@ def test_stress_use_browser() -> None:
jobs_storage = YamlJobsStorage([jobs_file])

if not os.getenv('GITHUB_ACTIONS'):
from src.webchanges.cli import setup_logger
from webchanges.cli import setup_logger

setup_logger()

Expand Down
2 changes: 1 addition & 1 deletion tests/test_reporters.py
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@ def test_reporters(reporter: str, capsys: pytest.CaptureFixture) -> None:
with pytest.raises(ValueError) as pytest_wrapped_e:
test_report.finish_one(reporter, check_enabled=False)
assert str(pytest_wrapped_e.value) == 'Reporter "run_command" needs a command'
if os.name == 'nt':
if sys.platform == 'win32':
test_report.config['report']['run_command']['command'] = 'cmd /C echo TEST'
else:
test_report.config['report']['run_command']['command'] = 'echo TEST'
Expand Down
20 changes: 14 additions & 6 deletions tests/test_storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ def prepare_storage_test(
jobs_storage = YamlJobsStorage([jobs_file])
urlwatcher = Urlwatch(command_config, config_storage, ssdb_storage, jobs_storage)

if os.name == 'nt':
if sys.platform == 'win32':
urlwatcher.jobs[0].command = 'echo %time% %random%'

return urlwatcher, ssdb_storage, command_config
Expand Down Expand Up @@ -596,16 +596,16 @@ def test_restore_and_backup(database_engine: SsdbStorage) -> None:

mime_type = 'text/plain' if isinstance(database_engine, (SsdbSQLite3Storage, SsdbRedisStorage)) else ''

ssdb_storage.restore([('myguid', 'mydata', 1618105974, 0, '', mime_type)])
ssdb_storage.restore((('myguid', 'mydata', 1618105974, 0, '', mime_type, {}),))
if hasattr(ssdb_storage, '_copy_temp_to_permanent'):
ssdb_storage._copy_temp_to_permanent(delete=True) # type: ignore[attr-defined]

entry = ssdb_storage.load('myguid')
assert entry == Snapshot('mydata', 1618105974, 0, '', mime_type)
assert entry == Snapshot('mydata', 1618105974, 0, '', mime_type, {})

entries = ssdb_storage.backup()
backup_entry = entries.__next__()
assert backup_entry == ('myguid', 'mydata', 1618105974, 0, '', mime_type)
assert backup_entry == ('myguid', 'mydata', 1618105974, 0, '', mime_type, {})


@pytest.mark.parametrize( # type: ignore[misc]
Expand Down Expand Up @@ -685,7 +685,7 @@ def test_migrate_urlwatch_legacy_db(tmp_path: Path, capsys: pytest.CaptureFixtur
try:
entries = ssdb_storage.backup()
entry = entries.__next__()
assert entry == ('547d652722e59e8894741a6382d973a89c8a7557', ' 9:52:54.74\n', 1618105974, 0, None, '')
assert entry == ('547d652722e59e8894741a6382d973a89c8a7557', ' 9:52:54.74\n', 1618105974.0, 0, None, '', {})
finally:
ssdb_storage.close()
temp_ssdb_file.unlink()
Expand Down Expand Up @@ -741,7 +741,15 @@ class DummySsdbStorage(SsdbStorage, ABC):
assert dummy_ssdb.get_history_data('guid') is None
assert (
dummy_ssdb.save(
guid='guid', snapshot=Snapshot(data='data', timestamp=0, tries=0, etag='etag', mime_type='text/plain')
guid='guid',
snapshot=Snapshot(
data='data',
timestamp=0,
tries=0,
etag='etag',
mime_type='text/plain',
error_data={},
),
)
is None
)
Expand Down
2 changes: 1 addition & 1 deletion webchanges/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
# * MINOR version when you add functionality in a backwards compatible manner, and
# * MICRO or PATCH version when you make backwards compatible bug fixes. We no longer use '0'
# If unsure on increments, use pkg_resources.parse_version to parse
__version__ = '3.27.0rc0'
__version__ = '3.27.0b1'
__description__ = (
'Check web (or command output) for changes since last run and notify.\n'
'\n'
Expand Down
4 changes: 2 additions & 2 deletions webchanges/_vendored/headers.py
Original file line number Diff line number Diff line change
Expand Up @@ -208,9 +208,9 @@ def get_list(self, key: str, split_commas: bool = False) -> list[str]:
if not split_commas:
return values

split_values = []
split_values: list[str] = []
for value in values:
split_values.extend([item.strip() for item in value.split(',')])
split_values.extend((item.strip() for item in value.split(',')))
return split_values

def update(self, headers: HeaderTypes | None = None) -> None: # type: ignore[override]
Expand Down
10 changes: 7 additions & 3 deletions webchanges/command.py
Original file line number Diff line number Diff line change
Expand Up @@ -425,15 +425,15 @@ def _find_job_with_defaults(self, query: str | int) -> JobBase:
def test_job(self, job_id: bool | str | int) -> None:
"""
Tests the running of a single job outputting the filtered text to stdout. If job_id is True, don't run any
jobs as it's a test of loading config, jobs and hook files for syntax.
jobs but load config, jobs and hook files to trigger any syntax errors.
:param job_id: The job_id or True.
:return: None.
:raises Exception: The Exception when raised by a job. loading of hooks files, etc.
"""
if job_id is True:
if job_id is True: # Load to trigger any eventual syntax errors
message = [f'No syntax errors in config file {self.urlwatch_config.config_file}']
conj = ',\n' if 'hooks' in sys.modules else '\nand '
if len(self.urlwatch_config.jobs_files) == 1:
Expand Down Expand Up @@ -590,7 +590,11 @@ def dump_history(self, job_id: str) -> int:
sep_len = max(50, len(header))
print(header)
print('-' * sep_len)
print(snapshot[0])
if snapshot.error_data:
print(f"{snapshot.error_data['type']}: {snapshot.error_data['message']}")
print()
print('Last good data:')
print(snapshot.data)
print('=' * sep_len, '\n')

print(
Expand Down
Loading

0 comments on commit f01b39e

Please sign in to comment.