Version 3.27.0b1

mborsetti · Nov 23, 2024 · f01b39e · f01b39e
1 parent 4d21530
commit f01b39e
Show file tree

Hide file tree

Showing 22 changed files with 351 additions and 152 deletions.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -175,6 +175,7 @@ repos:
           - mypy
           - types-backports
           - types-docutils
+          - types-python-dateutil
           - types-PyYAML
           - types-redis
           - types-requests

diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -33,19 +33,42 @@ can check out the `wish list <https://github.com/mborsetti/webchanges/blob/main/
    Internals, for changes that don't affect users. [triggers a minor patch]
 
 
-Version 3.27.0rc0
+Version 3.27.0b1
 ==================
 Unreleased
 
+⚠ Breaking Changes
+------------------
+* Error notifications for failed jobs will now only be sent when an error is first encountered. Additional
+  notifications for the same error will not be sent unless the error resolves or a different error occurs. To restore
+  the previous behavior of receiving repeated notifications for the same error, add or modify the ``repeated_error``
+  setting under the ``display`` key in your config file:
+
+  .. code-block:: yaml
+
+     display:
+       _note: this is a note
+       new: false
+       error: true
+       repeated_error: true  # defaults to false
+       unchanged: false
+       empty-diff: false
+
+
+  This enhancement was requested by `toxin-x <https://github.com/toxin-x>`__ in issue `#86
+  <https://github.com/mborsetti/webchanges/issues/86>`__.
+
 Added
 -----
 * Python 3.13: **webchanges** now is tested on Python 3.13 before releasing. However, the `aioxmpp
   <https://pypi.org/project/aioxmpp/>`__ library required by the ``xmpp`` reporter will not install in Python 3.13 (at
   least on Windows), and the development of the `library <https://codeberg.org/jssfr/aioxmpp>`__ has been
   halted.
-* Python 3.13t (free-threaded, GIL-free) remains unsupported due to lack of support by dependencies such
-  as ``lxml``.
+
+  - Python 3.13t (free-threaded, GIL-free) remains unsupported due to the lack of free-threaded wheels of dependencies
+    such as ``cryptography``, ``msgpack``, ``lxml``, and the optional ``jq``.
 * New Sub-directive in ``pypdf`` Filter: Added ``extraction_mode`` sub-directive.
+* Now storing error information in snapshot database.
 
 Internals
 ---------

diff --git a/RELEASE.rst b/RELEASE.rst
@@ -1,10 +1,14 @@
 Added
 -----
-* Python 3.13 Testing: **webchanges** now is tested on Python 3.13 before releasing. However, the `aioxmpp
+* Python 3.13: **webchanges** now is tested on Python 3.13 before releasing. However, the `aioxmpp
   <https://pypi.org/project/aioxmpp/>`__ library required by the ``xmpp`` reporter will not install in Python 3.13 (at
   least on Windows), and the development of the `library <https://codeberg.org/jssfr/aioxmpp>`__ has been
   halted.
 
+  - Python 3.13t (free-threaded, GIL-free) remains unsupported due to the lack of free-threaded wheels of dependencies
+    such as ``cryptography``, ``msgpack``, ``lxml``, and the optional ``jq``.
+* New Sub-directive in ``pypdf`` Filter: Added ``extraction_mode`` sub-directive.
+
 Internals
 ---------
 * Added ``ai_google`` directive to the ``image`` differ to test Generative AI summarization of differences between two

diff --git a/docs/differs.rst b/docs/differs.rst
@@ -214,6 +214,11 @@ directive to specify another `model <https://ai.google.dev/models/gemini>`__, su
 Pro <https://ai.google.dev/gemini-api/docs/models/gemini#gemini-1.5-pro-expandable>`__ with a context window of 2
 million tokens (``gemini-1.5-pro-latest``) or the older Gemini 1.0 Pro (``gemini-1.0-pro-latest``).
 
+The full list of production models available is `here <https://ai.google.dev/gemini-api/docs/models/gemini>`__, and
+additional experimental models (if any) are listed `here
+<https://ai.google.dev/gemini-api/docs/models/experimental-models>`__. You can manually evaluate responses side-by-side
+across the various models `here <https://aistudio.google.com/app/prompts/new_comparison>`__.
+
 .. note:: These models work with `38 languages
    <https://ai.google.dev/gemini-api/docs/models/gemini#available-languages>`__ and are available in over `200 regions
    <https://ai.google.dev/gemini-api/docs/available-regions>`__.
@@ -307,7 +312,9 @@ This differ is currently in BETA and these directives MAY change in the future.
 * ``top_p`` (float between 0.0 and 1.0): The model's TopP parameter, or the cumulative probability cutoff for token
   selection; lower p means sampling from a smaller, more top-weighted nucleus and reduces diversity (see note below)
   (default: model-dependent, but typically 0.95 or 1.0, see Google documentation)
-* ``unified`` (dict): directives passed to :ref:`unified differ <unified_diff>`, which prepares the unified diff
+* ``tools`` (list): Data passed on to the API's 'tool' field, for example to ground the response (see `here
+  <https://ai.google.dev/api/caching#Tool>`__.
+* ``unified`` (dict): Directives passed to :ref:`unified differ <unified_diff>`, which prepares the unified diff
   attached to this report.
 
 Directives for the underlying :ref:`unified differ <unified_diff>` can be passed in as key ``unified``, as follows:

diff --git a/docs/hooks.rst b/docs/hooks.rst
@@ -40,6 +40,7 @@ Example ``hooks.py`` file:
    """Example hooks file for webchanges (for Python >= 3.12)."""
 
    import re
+   import threading
    from pathlib import Path
    from typing import Any, Literal
 
@@ -49,6 +50,8 @@ Example ``hooks.py`` file:
    from webchanges.jobs import UrlJob, UrlJobBase
    from webchanges.reporters import HtmlReporter, TextReporter
 
+   hooks_custom_login_lock = threading.Lock()
+
 
    class CustomLoginJob(UrlJob):
        """Custom login for my webpage.
@@ -62,7 +65,8 @@ Example ``hooks.py`` file:
 
        def retrieve(self, job_state: JobState, headless: bool = True) -> tuple[bytes | str, str, str]:
            """:returns: The data retrieved, the ETag, and the mime_type (e.g. HTTP Content-Type)."""
-           ...  # custom code here to actually do the login.
+           with hooks_custom_login_lock:  # this site doesn't like parallel logins
+               ...  # custom code here to actually do the login.
            additional_headers = {'x-special': 'test'}
            self.headers.update(additional_headers)  # self.headers always an httpx.Headers object
            return super().retrieve(job_state)  # uses the existing code to then browse and capture data

diff --git a/tests/test_command.py b/tests/test_command.py
@@ -56,7 +56,7 @@
 
 # Set up dummy editor
 editor = os.getenv('EDITOR')
-if os.name == 'nt':
+if sys.platform == 'win32':
     os.environ['EDITOR'] = 'rundll32'
 else:
     os.environ['EDITOR'] = 'true'
@@ -523,7 +523,7 @@ def test_test_differ_and_joblist(capsys: pytest.CaptureFixture[str]) -> None:
     jobs_storage = YamlJobsStorage([jobs_file])
     command_config = new_command_config(jobs_file=jobs_file)
     urlwatcher = Urlwatch(command_config, config_storage, snapshot_storage, jobs_storage)  # main.py
-    if os.name == 'nt':
+    if sys.platform == 'win32':
         urlwatcher.jobs[0].command = 'echo %time% %random%'
     guid = urlwatcher.jobs[0].get_guid()
 
@@ -757,7 +757,7 @@ def test_delete_snapshot(capsys: pytest.CaptureFixture[str]) -> None:
     jobs_storage = YamlJobsStorage([jobs_file])
     command_config = new_command_config(jobs_file=jobs_file)
     urlwatcher = Urlwatch(command_config, config_storage, snapshot_storage, jobs_storage)  # main.py
-    if os.name == 'nt':
+    if sys.platform == 'win32':
         urlwatcher.jobs[0].command = 'echo %time% %random%'
 
     setattr(command_config, 'delete_snapshot', True)
@@ -816,7 +816,7 @@ def test_gc_database(capsys: pytest.CaptureFixture[str]) -> None:
     jobs_storage = YamlJobsStorage([jobs_file])
     command_config = new_command_config(jobs_file=jobs_file)
     urlwatcher = Urlwatch(command_config, config_storage, snapshot_storage, jobs_storage)  # main.py
-    if os.name == 'nt':
+    if sys.platform == 'win32':
         urlwatcher.jobs[0].command = 'echo %time% %random%'
     guid = urlwatcher.jobs[0].get_guid()
 
@@ -839,7 +839,7 @@ def test_gc_database(capsys: pytest.CaptureFixture[str]) -> None:
     setattr(command_config, 'gc_database', False)
     assert pytest_wrapped_e.value.code == 0
     message = capsys.readouterr().out
-    if os.name == 'nt':
+    if sys.platform == 'win32':
         assert message == f'Deleting job {guid} (no longer being tracked)\n'
     else:
         # TODO: for some reason, Linux message is ''.  Need to figure out why.
@@ -1170,6 +1170,7 @@ def test_job_states_verb_notimestamp_unchanged() -> None:
             tries=1,
             etag=snapshot.etag,
             mime_type=snapshot.mime_type,
+            error_data=snapshot.error_data,
         ),
     )
     ssdb_storage._copy_temp_to_permanent(delete=True)
@@ -1217,7 +1218,7 @@ def test_job_states_verb_notimestamp_changed() -> None:
 
     # modify database to no timestamp
     urlwatcher.ssdb_storage.delete(guid)
-    new_snapshot = Snapshot(snapshot.data, 0, snapshot.tries, snapshot.etag, snapshot.mime_type)
+    new_snapshot = Snapshot(snapshot.data, 0, snapshot.tries, snapshot.etag, snapshot.mime_type, snapshot.error_data)
     urlwatcher.ssdb_storage.save(guid=guid, snapshot=new_snapshot)
     ssdb_storage._copy_temp_to_permanent(delete=True)
     # run again
@@ -1227,7 +1228,7 @@ def test_job_states_verb_notimestamp_changed() -> None:
 
     # modify database to no timestamp and 1 try
     urlwatcher.ssdb_storage.delete(guid)
-    new_snapshot = Snapshot(snapshot.data, 0, 1, snapshot.etag, snapshot.mime_type)
+    new_snapshot = Snapshot(snapshot.data, 0, 1, snapshot.etag, snapshot.mime_type, {})
     urlwatcher.ssdb_storage.save(guid=guid, snapshot=new_snapshot)
     ssdb_storage._copy_temp_to_permanent(delete=True)
     # run again

diff --git a/tests/test_differs.py b/tests/test_differs.py
@@ -42,7 +42,7 @@
 # py_nt_only = cast(
 #     Callable[[Callable], Callable],
 #     pytest.mark.skipif(
-#         os.name == 'nt',
+#         sys.platform == 'win32',
 #         reason='Not working on Linux',
 #     ),
 # )
@@ -452,7 +452,7 @@ def test_command_change(job_state: JobState) -> None:
     """
     job_state.old_data = 'a\n'
     job_state.new_data = 'b\n'
-    if os.name == 'nt':
+    if sys.platform == 'win32':
         command = 'cmd /C exit 1 & rem'
     else:
         command = 'bash -c " echo \'This is a custom diff\'; exit 1" #'
@@ -486,7 +486,7 @@ def test_command_error(job_state: JobState) -> None:
     """
     job_state.old_data = 'a\n'
     job_state.new_data = 'b\n'
-    if os.name == 'nt':
+    if sys.platform == 'win32':
         command = 'cmd /C exit 2 & rem'
     else:
         command = 'bash -c " echo \'This is a custom diff\'; exit 2" #'
@@ -505,7 +505,7 @@ def test_command_bad_command(job_state: JobState) -> None:
     job_state.job.differ = {'name': 'command', 'command': 'dfgfdgsdfg'}
     job_state.get_diff()
     assert isinstance(job_state.exception, FileNotFoundError)
-    if os.name == 'nt':
+    if sys.platform == 'win32':
         assert str(job_state.exception) == '[WinError 2] The system cannot find the file specified'
 
 
@@ -516,7 +516,7 @@ def test_command_command_error(job_state: JobState) -> None:
     job_state.job.differ = {'name': 'command', 'command': 'dir /x'}
     job_state.get_diff()
     assert isinstance(job_state.exception, (RuntimeError, FileNotFoundError))
-    if os.name == 'nt':
+    if sys.platform == 'win32':
         assert str(job_state.exception) == (
             "Job 0: External differ '{'command': 'dir /x'}' returned 'dir: cannot access "
             "'/x': No such file or directory' ()"

diff --git a/tests/test_handler.py b/tests/test_handler.py
@@ -228,7 +228,7 @@ def test_number_of_tries_in_cache_is_increased_sqlite3() -> None:
         snapshot = ssdb_storage.load(guid)
 
         assert snapshot.tries == 2
-        assert urlwatcher.report.job_states[-1].verb == 'error'
+        assert urlwatcher.report.job_states[-1].verb == 'repeated_error'
     finally:
         ssdb_storage.delete_all()
 
@@ -246,7 +246,7 @@ def test_report_error_when_out_of_tries_sqlite3() -> None:
         ssdb_storage._copy_temp_to_permanent(delete=True)
 
         report = urlwatcher.report
-        assert report.job_states[-1].verb == 'error'
+        assert report.job_states[-1].verb == 'repeated_error'
     finally:
         ssdb_storage.delete_all()
 

diff --git a/tests/test_jobs.py b/tests/test_jobs.py
@@ -376,7 +376,7 @@ def test_stress_use_browser() -> None:
     jobs_storage = YamlJobsStorage([jobs_file])
 
     if not os.getenv('GITHUB_ACTIONS'):
-        from src.webchanges.cli import setup_logger
+        from webchanges.cli import setup_logger
 
         setup_logger()
 

diff --git a/tests/test_reporters.py b/tests/test_reporters.py
@@ -209,7 +209,7 @@ def test_reporters(reporter: str, capsys: pytest.CaptureFixture) -> None:
             with pytest.raises(ValueError) as pytest_wrapped_e:
                 test_report.finish_one(reporter, check_enabled=False)
             assert str(pytest_wrapped_e.value) == 'Reporter "run_command" needs a command'
-            if os.name == 'nt':
+            if sys.platform == 'win32':
                 test_report.config['report']['run_command']['command'] = 'cmd /C echo TEST'
             else:
                 test_report.config['report']['run_command']['command'] = 'echo TEST'

diff --git a/tests/test_storage.py b/tests/test_storage.py
@@ -93,7 +93,7 @@ def prepare_storage_test(
     jobs_storage = YamlJobsStorage([jobs_file])
     urlwatcher = Urlwatch(command_config, config_storage, ssdb_storage, jobs_storage)
 
-    if os.name == 'nt':
+    if sys.platform == 'win32':
         urlwatcher.jobs[0].command = 'echo %time% %random%'
 
     return urlwatcher, ssdb_storage, command_config
@@ -596,16 +596,16 @@ def test_restore_and_backup(database_engine: SsdbStorage) -> None:
 
     mime_type = 'text/plain' if isinstance(database_engine, (SsdbSQLite3Storage, SsdbRedisStorage)) else ''
 
-    ssdb_storage.restore([('myguid', 'mydata', 1618105974, 0, '', mime_type)])
+    ssdb_storage.restore((('myguid', 'mydata', 1618105974, 0, '', mime_type, {}),))
     if hasattr(ssdb_storage, '_copy_temp_to_permanent'):
         ssdb_storage._copy_temp_to_permanent(delete=True)  # type: ignore[attr-defined]
 
     entry = ssdb_storage.load('myguid')
-    assert entry == Snapshot('mydata', 1618105974, 0, '', mime_type)
+    assert entry == Snapshot('mydata', 1618105974, 0, '', mime_type, {})
 
     entries = ssdb_storage.backup()
     backup_entry = entries.__next__()
-    assert backup_entry == ('myguid', 'mydata', 1618105974, 0, '', mime_type)
+    assert backup_entry == ('myguid', 'mydata', 1618105974, 0, '', mime_type, {})
 
 
 @pytest.mark.parametrize(  # type: ignore[misc]
@@ -685,7 +685,7 @@ def test_migrate_urlwatch_legacy_db(tmp_path: Path, capsys: pytest.CaptureFixtur
         try:
             entries = ssdb_storage.backup()
             entry = entries.__next__()
-            assert entry == ('547d652722e59e8894741a6382d973a89c8a7557', ' 9:52:54.74\n', 1618105974, 0, None, '')
+            assert entry == ('547d652722e59e8894741a6382d973a89c8a7557', ' 9:52:54.74\n', 1618105974.0, 0, None, '', {})
         finally:
             ssdb_storage.close()
             temp_ssdb_file.unlink()
@@ -741,7 +741,15 @@ class DummySsdbStorage(SsdbStorage, ABC):
     assert dummy_ssdb.get_history_data('guid') is None
     assert (
         dummy_ssdb.save(
-            guid='guid', snapshot=Snapshot(data='data', timestamp=0, tries=0, etag='etag', mime_type='text/plain')
+            guid='guid',
+            snapshot=Snapshot(
+                data='data',
+                timestamp=0,
+                tries=0,
+                etag='etag',
+                mime_type='text/plain',
+                error_data={},
+            ),
         )
         is None
     )

diff --git a/webchanges/__init__.py b/webchanges/__init__.py
@@ -22,7 +22,7 @@
 # * MINOR version when you add functionality in a backwards compatible manner, and
 # * MICRO or PATCH version when you make backwards compatible bug fixes. We no longer use '0'
 # If unsure on increments, use pkg_resources.parse_version to parse
-__version__ = '3.27.0rc0'
+__version__ = '3.27.0b1'
 __description__ = (
     'Check web (or command output) for changes since last run and notify.\n'
     '\n'

diff --git a/webchanges/_vendored/headers.py b/webchanges/_vendored/headers.py
@@ -208,9 +208,9 @@ def get_list(self, key: str, split_commas: bool = False) -> list[str]:
         if not split_commas:
             return values
 
-        split_values = []
+        split_values: list[str] = []
         for value in values:
-            split_values.extend([item.strip() for item in value.split(',')])
+            split_values.extend((item.strip() for item in value.split(',')))
         return split_values
 
     def update(self, headers: HeaderTypes | None = None) -> None:  # type: ignore[override]

diff --git a/webchanges/command.py b/webchanges/command.py
@@ -425,15 +425,15 @@ def _find_job_with_defaults(self, query: str | int) -> JobBase:
     def test_job(self, job_id: bool | str | int) -> None:
         """
         Tests the running of a single job outputting the filtered text to stdout. If job_id is True, don't run any
-        jobs as it's a test of loading config, jobs and hook files for syntax.
+        jobs but load config, jobs and hook files to trigger any syntax errors.
 
         :param job_id: The job_id or True.
 
         :return: None.
 
         :raises Exception: The Exception when raised by a job. loading of hooks files, etc.
         """
-        if job_id is True:
+        if job_id is True:  # Load to trigger any eventual syntax errors
             message = [f'No syntax errors in config file {self.urlwatch_config.config_file}']
             conj = ',\n' if 'hooks' in sys.modules else '\nand '
             if len(self.urlwatch_config.jobs_files) == 1:
@@ -590,7 +590,11 @@ def dump_history(self, job_id: str) -> int:
             sep_len = max(50, len(header))
             print(header)
             print('-' * sep_len)
-            print(snapshot[0])
+            if snapshot.error_data:
+                print(f"{snapshot.error_data['type']}: {snapshot.error_data['message']}")
+                print()
+                print('Last good data:')
+            print(snapshot.data)
             print('=' * sep_len, '\n')
 
         print(