Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blocking #62

Closed
hardbyte opened this issue Oct 10, 2017 · 1 comment
Closed

Blocking #62

hardbyte opened this issue Oct 10, 2017 · 1 comment
Assignees
Labels

Comments

@hardbyte
Copy link
Collaborator

hardbyte commented Oct 10, 2017

Consider allowing users to upload "blocks" along with the CLKs.

http://www.record-linkage.de/-Research--fuzzy_blocking.htm

https://github.com/dinusha9/DLAW02

@hardbyte
Copy link
Collaborator Author

hardbyte commented Feb 19, 2020

This oft requested feature is now under development. A new library blocklib has been written and the anonlink client can use that to generate CLK based encodings with blocking information.

High level components of the epic:

  • The REST endpoints will be updated to accept the new blocking format for JSON and binary uploads. See Extend upload endpoint to accept both types - clks and clknblocks #503
  • The internal binary format will be extended to include entity ids. Feature extend binary format #505
  • A quarantine task will have to process the new uploaded files and create a single new new file using the new internal binary format.
  • The database schema needs to change from a 1-1 mapping from dataprovider to encoding file.
  • A backend task (perhaps the same quarantine task) will split the uploaded data into a file per block and store the blocking information in postgresql.
  • the comparison tasks need to deal with entity id and blocking info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants