Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intel-2023b: fi_info not working as expected #21715

Open
sassy-crick opened this issue Oct 22, 2024 · 1 comment
Open

intel-2023b: fi_info not working as expected #21715

sassy-crick opened this issue Oct 22, 2024 · 1 comment

Comments

@sassy-crick
Copy link
Collaborator

During running the test jobs of MOLCAS-84 I came across that issue:

Abort(606203407) on node 10 (rank 10 in comm 0): Fatal error in PMPI_Put: Other MPI error, error stack:
PMPI_Put(160)........: MPI_Put(origin_addr=0x7ffde9f037a0, origin_count=5, MPI_LONG, target_rank=6, target_disp=50, target_count=5, MPI_LONG, win=0xe0000002) failed
MPID_Put(896)........: 
MPIDI_put_safe(565)..: 
MPIDI_put_unsafe(71).: 
MPIDI_OFI_do_put(436): OFI rdma write immediate failed (ofi_rma.h:436:MPIDI_OFI_do_put:Invalid argument)

Given that more than one job failed, I done a bit of digging and notice this command is not working as expected:

$ fi_info | grep provider
fi_getinfo: -61

Further digging revealed it is working up to intel-2023a and also works with intel-2024a. So for me clearly intel-2023b has a problem.
My hunch is the problem is outside of what EasyBuild does. I will try and do some more digging.

@sassy-crick
Copy link
Collaborator Author

Update on this: It all works as expected if l_mpi_oneapi_p_2021.13.1.769_offline.sh is being used. This means, we might update the impi EasyConfig accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant