Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way to download full dataset archive (including PDFs) from the web. #446

Merged
merged 2 commits into from
Oct 18, 2024

Conversation

J08nY
Copy link
Member

@J08nY J08nY commented Oct 17, 2024

As it says in the title. This is useful if someone wants to have the full dataset.
I think a utility method that would download dataset + auxiliary datasets while not downloading all the PDFs would also be quite helpful and would actually be what 90% people want.

Wdyt @adamjanovsky ?

@J08nY J08nY added enhancement New feature or request fips Related to FIPS 140 certification cc Related to CC certification labels Oct 17, 2024
@J08nY J08nY requested a review from adamjanovsky October 17, 2024 12:27
@adamjanovsky
Copy link
Collaborator

That would be a nice addition. Did you consider not adding new method? Instead we could add download_artifacts: bool = False optional argument into from_web_latest() method. What do you think?

@J08nY
Copy link
Member Author

J08nY commented Oct 17, 2024

Yes I did consider it, but right now that method downloads a "pathless" dataset just into memory, while this one needs a path(directory) where to save it. So it would need two arguments, which was a bigger change but we can go that route, we just need to figure out a sensible behavior for the combinations of arguments.

Copy link

codecov bot commented Oct 17, 2024

Codecov Report

Attention: Patch coverage is 33.33333% with 48 lines in your changes missing coverage. Please review.

Project coverage is 67.46%. Comparing base (c8c91a0) to head (e7ba5ef).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
src/sec_certs/dataset/dataset.py 13.52% 32 Missing ⚠️
src/sec_certs/dataset/cc.py 46.43% 15 Missing ⚠️
src/sec_certs/dataset/fips.py 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #446      +/-   ##
==========================================
- Coverage   67.77%   67.46%   -0.31%     
==========================================
  Files          62       62              
  Lines        7567     7608      +41     
==========================================
+ Hits         5128     5132       +4     
- Misses       2439     2476      +37     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@J08nY J08nY merged commit 2a3d45c into main Oct 18, 2024
6 checks passed
@J08nY J08nY deleted the feat/full-dset-archive-download branch October 18, 2024 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cc Related to CC certification enhancement New feature or request fips Related to FIPS 140 certification
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants