Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/schemes again #456

Merged
merged 9 commits into from
Nov 8, 2024
Merged

Fix/schemes again #456

merged 9 commits into from
Nov 8, 2024

Conversation

J08nY
Copy link
Member

@J08nY J08nY commented Nov 4, 2024

This significantly improves the CC scheme extraction by:

  • Fixing the extraction of several schemes that were mixing
    certified and archived entries by accident.
  • Improving the extraction of cert_ids from scheme sites.
  • Improving the matching heuristic to consider more attributes
    that are usually present in the site data.

Also adds an evaluation notebook to see how this performs.

Fixes #454
Fixes #455

@J08nY J08nY linked an issue Nov 4, 2024 that may be closed by this pull request
Copy link

codecov bot commented Nov 4, 2024

Codecov Report

Attention: Patch coverage is 77.77778% with 102 lines in your changes missing coverage. Please review.

Project coverage is 67.89%. Comparing base (f2c654f) to head (c5ceab2).
Report is 20 commits behind head on main.

Files with missing lines Patch % Lines
src/sec_certs/sample/cc_scheme.py 77.78% 76 Missing ⚠️
src/sec_certs/model/cc_matching.py 73.44% 17 Missing ⚠️
src/sec_certs/sample/cc_certificate_id.py 42.86% 8 Missing ⚠️
src/sec_certs/dataset/cc.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #456      +/-   ##
==========================================
+ Coverage   67.52%   67.89%   +0.38%     
==========================================
  Files          62       62              
  Lines        7609     7900     +291     
==========================================
+ Hits         5137     5363     +226     
- Misses       2472     2537      +65     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

J08nY added 5 commits November 4, 2024 21:45
Only match if category matches.
Disregard unwanted warnings.
Add progress bars everywhere.
This significantly improves the CC scheme extraction by:

 - Fixing the extraction of several schemes that were mixing
   certified and archived entries by accident.
 - Improving the extraction of cert_ids from scheme sites.
 - Improving the matching heuristic to consider more attributes
   that are usually present in the site data.

Also adds an evaluation notebook to see how this performs.
@J08nY J08nY merged commit 52e5851 into main Nov 8, 2024
6 checks passed
@J08nY J08nY deleted the fix/schemes-again branch November 8, 2024 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add TrustCB scraping Add Poland scheme
1 participant