Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

two Celery tasks per Datafile upload #1

Open
avrljk opened this issue Feb 12, 2018 · 1 comment
Open

two Celery tasks per Datafile upload #1

avrljk opened this issue Feb 12, 2018 · 1 comment

Comments

@avrljk
Copy link
Contributor

avrljk commented Feb 12, 2018

Why are we seeing two Celery tasks call process_meta() on the same new Datafile upload?

The BioFormats filter code that the nifcert filter was based on had locking code to manage this, so it may not be specific to nifcert.

the nifcert filter retains the locking code from the BioFormats filter, so the duplicate task has no effect, except an ERROR message in the log (duplicate key). I'd guess locking is implemented by inserting a row into a table with a unique key requirement, so this is perhaps expected behaviour.

I've confirmed with debugging output that both tasks have Datafile.verified == True, so the second thread is not in response to the Datafile being updated after the uploaded file's checksum is verified.

Below: sample log file output produced by clicking the "Add Files..." button on the view Dataset page and completing the subsequent actions to select and confirm a file.

celery_1    | [2018-02-12 16:52:54,643: INFO/MainProcess] Received task: tardis_portal.dfo.verify[30d435fe-a1a9-4bca-9e51-ae8c64a10940] eta:[2018-02-12 16:52:59.641338+08:00]
celery_1    | [2018-02-12 16:52:54,651: INFO/MainProcess] Received task: tardis_portal.dfo.verify[2e7b8073-7a36-4b3b-9911-65503e7ec4a1] eta:[2018-02-12 16:52:59.650151+08:00]
django_1    | [12/Feb/2018 16:52:54] INFO    172.18.0.1      root POST /api/v1/dataset_file/ 201
django_1    | 172.18.0.1 - - [12/Feb/2018:16:52:54 +0800] "POST /api/v1/dataset_file/ HTTP/1.1" 201 - "http://localhost:8000/dataset/1" "Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0"
django_1    | [12/Feb/2018 16:52:54] INFO    172.18.0.1      root GET /ajax/datafile_list/1/ 200
django_1    | 172.18.0.1 - - [12/Feb/2018:16:52:54 +0800] "GET /ajax/datafile_list/1/ HTTP/1.1" 200 6976 "http://localhost:8000/dataset/1" "Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0"
celery_1    | [2018-02-12 16:52:59,673: INFO/MainProcess] Task tardis_portal.dfo.verify[30d435fe-a1a9-4bca-9e51-ae8c64a10940] succeeded in 0.0232214999996s: None
celery_1    | [2018-02-12 16:52:59,674: INFO/MainProcess] Received task: nifcert.process_meta[b8407f06-f5ad-4d0a-9846-450346dc4f56]
celery_1    | [2018-02-12 16:52:59,675: INFO/MainProcess] Task tardis_portal.dfo.verify[2e7b8073-7a36-4b3b-9911-65503e7ec4a1] succeeded in 0.0236361580028s: None
celery_1    | [2018-02-12 16:52:59,676: INFO/MainProcess] Received task: nifcert.process_meta[9dfa2e9b-c243-470d-a66e-97a28c84a317]
db_1        | ERROR:  duplicate key value violates unique constraint "celery_lock_cache_pkey"
db_1        | DETAIL:  Key (cache_key)=(:1:tardis_portal_nifcert_lock_datafile_257) already exists.
celery_1    | [2018-02-12 16:52:59,687: INFO/MainProcess] Task nifcert.process_meta[b8407f06-f5ad-4d0a-9846-450346dc4f56] succeeded in 0.0101797460011s: None
db_1        | STATEMENT:  INSERT INTO "celery_lock_cache" (cache_key, value, expires) VALUES (':1:tardis_portal_nifcert_lock_datafile_257', 'gAJVBHRydWVxAS4=', '2018-02-12T08:53:59'::timestamp)
celery_1    | [2018-02-12 16:52:59,780: INFO/MainProcess] Task nifcert.process_meta[9dfa2e9b-c243-470d-a66e-97a28c84a317] succeeded in 0.103251791999s: None
@avrljk
Copy link
Contributor Author

avrljk commented May 29, 2018

351a420 protects the NIFCert app from the duplicate call.

The verified flag (in the debugging output mentioned in original issue) was probably set in both NIFCert task instances because our configuration seems to take a few seconds to initiate tasks. This allows the MyTardis async checksum verification task to update the database before the NIFCert tasks begin.

Note: the BioFormats code relies on a stale DataFile instance passed in as a parameter (which the Django/Celery docs say is a bad idea).

It's unclear whether the locking technique described in the Celery 3.1docs (used by BioFormats) is robust if the cache is exhausted on a heavily loaded system. BioFormats sometimes skips thumbnail generation for files. This may be the reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant