Skip to content

Commit

Permalink
Fix pooch retrieval of file registry (#260)
Browse files Browse the repository at this point in the history
* Fix pooch retrieval of file registry

We need to specify a filename in `pooch.retrieve` for the file to be correctly overwritten everytime.

If filename is None, it is set as <hash-of-the-url> + <last-part-of-url>. So if the URL stays the same, the file won't be updated, even if the contents are changed

Signed-off-by: sfmig <[email protected]>

* Fix precommits and add wheel as dependency to check-manifest (py3.12)

* Force file registry to download every time

---------

Signed-off-by: sfmig <[email protected]>
  • Loading branch information
sfmig authored Dec 17, 2024
1 parent 3532188 commit 42a7eb6
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 4 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ repos:
hooks:
- id: check-manifest
args: [--no-build-isolation]
additional_dependencies: [setuptools-scm]
additional_dependencies: [setuptools-scm, wheel]
# - repo: https://github.com/codespell-project/codespell
# # Configuration for codespell is in pyproject.toml
# rev: v2.3.0
Expand Down
16 changes: 13 additions & 3 deletions tests/fixtures/integration.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,18 +21,28 @@ def pooch_registry() -> dict:
URL and hash of the GIN repository with the test data
"""
# Cache the test data in the user's home directory
test_data_dir = Path.home() / ".crabs-exploration-test-data"

# Remove the file registry if it exists
# otherwise the registry is not downloaded everytime
file_registry_path = test_data_dir / "files-registry.txt"
if file_registry_path.is_file():
Path(file_registry_path).unlink()

# Initialise pooch registry
registry = pooch.create(
Path.home() / ".crabs-exploration-test-data",
test_data_dir,
base_url=f"{GIN_TEST_DATA_REPO}/raw/master/test_data",
)

# Download only the registry file from GIN
# if known_hash = None, the file is always downloaded.
# (this file should always be downloaded fresh from GIN)
file_registry = pooch.retrieve(
url=f"{GIN_TEST_DATA_REPO}/raw/master/files-registry.txt",
known_hash=None,
path=Path.home() / ".crabs-exploration-test-data",
fname=file_registry_path.name,
path=file_registry_path.parent,
)

# Load registry file onto pooch registry
Expand Down

0 comments on commit 42a7eb6

Please sign in to comment.