Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pulsar queued_python potentially isn't executing jobs on my machine. #375

Open
hexylena opened this issue Oct 10, 2024 · 1 comment
Open

Comments

@hexylena
Copy link
Member

hexylena commented Oct 10, 2024

Reported in admins matrix,

app.yml
---
dependency_resolution:
  resolvers:
  - auto_init: true
    auto_install: true
    type: conda
job_metrics_config_file: job_metrics_conf.yml
min_polling_interval: 0.5
persistence_directory: /mnt/pulsar/files/persisted_data
private_token: asdf
staging_directory: /mnt/pulsar/files/staging
tool_dependency_dir: /mnt/pulsar/deps
managers:
  _default_:
    type: queued_python
    num_concurrent_jobs: 1

but my jobs don't execute:

Oct 09 16:51:03 worker1 uwsgi[66861]: 2024-10-09 16:51:03,445 DEBUG [pulsar.managers.base][uWSGIWorker1Core0] job_id: 16 - checking tool file cutWrapper.pl
Oct 09 16:51:03 worker1 uwsgi[66861]: 2024-10-09 16:51:03,446 DEBUG [galaxy.tool_util.deps][uWSGIWorker1Core0] Using dependency perl version 5.26 of type conda 
Oct 09 16:51:03 worker1 uwsgi[66861]: [pid: 66861|app: 0|req: 610/610] 145.38.195.22 () {32 vars in 5437 bytes} [Wed Oct  9 16:51:03 2024] POST /managers/_default_/jobs/16/submit?command_line=%2Fbin%2Fbash+%2Fm..........

Nate suggested py-spy

root@worker1:/mnt/pulsar# py-spy dump --pid 66861
Process 66861: /mnt/pulsar/venv/bin/uwsgi --ini-paste /mnt/pulsar/config/server.ini
Python v3.10.12 (/mnt/pulsar/venv/bin/uwsgi)

Thread 0x7F93FB7D1040 (active): "uWSGIWorker1Core0"
Thread 0x7F93F35FF640 (idle): "Thread-1 (run_next)"
    wait (threading.py:320)
    get (queue.py:171)
    run_next (pulsar/managers/queued.py:83)
    run (threading.py:953)
    _bootstrap_inner (threading.py:1016)
    _bootstrap (threading.py:973)

and I can verify that things get added to the queue, but nothing seems to be read from the queue.

pulsar-check --private_token=asdf --debug


INFO:pulsar.client.manager:Setting Pulsar client class to standard, non-caching variant.

DEBUG:pulsar.client.client:Uploading path [/tmp/pulsar-check-client.dn8jifhl/t/script.py] (action_type: [transfer])

DEBUG:pulsar.client.client:Uploading path [/tmp/pulsar-check-client.dn8jifhl/dataset_0.dat] (action_type: [transfer])

DEBUG:pulsar.client.client:Uploading path [/tmp/pulsar-check-client.dn8jifhl/dataset_0_files/input_subdir/extra] (action_type: [transfer])

DEBUG:pulsar.client.client:Uploading path [/tmp/pulsar-check-client.dn8jifhl/metadata/12312231231231.dat] (action_type: [transfer])

DEBUG:pulsar.client.client:Uploading path [/tmp/pulsar-check-client.dn8jifhl/w/config.txt] (action_type: [transfer])

DEBUG:pulsar.client.client:Uploading path [/tmp/pulsar-check-client.dn8jifhl/m/metadata_test123] (action_type: [transfer])

DEBUG:pulsar.client.client:Uploading path [/tmp/pulsar-check-client.dn8jifhl/idx/seq/human_full_seqs] (action_type: [transfer])

DEBUG:pulsar.client.client:Uploading path [/tmp/pulsar-check-client.dn8jifhl/idx/bwa/human.fa.fai] (action_type: [transfer])

DEBUG:pulsar.client.client:Uploading path [/tmp/pulsar-check-client.dn8jifhl/idx/bwa/human.fa] (action_type: [transfer])

DEBUG:pulsar.client.client:Uploading path [/tmp/pulsar-check-client.dn8jifhl/w/config.txt] (action_type: [message])


swapping to queued_condor and making no other changes, enabled jobs to execute.

Running the latest pulsar:

(venv) root@worker1:/home/hrasche2# pip freeze | grep pulsar
pulsar-app==0.15.6

In this case I'd rather not install htcondor if it isn't necessary.

@hexylena
Copy link
Member Author

hexylena commented Nov 12, 2024

@natefoo @jmchilton if y'all have ideas on this, i'd appreciate it. anything I can test? It seems like a very default configuration. do i need to be running more uwsgi processes? I would like to avoid the overhead of a real DRM for this use case, I looked into the github tests a bit but those seem to setup proper slurm rather than using queued_python, is it possible that that deployment option isn't working?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant