Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mstconfig cannot access devices with a large PCI domain #925

Open
acgoldma opened this issue Mar 29, 2024 · 4 comments · May be fixed by #926
Open

mstconfig cannot access devices with a large PCI domain #925

acgoldma opened this issue Mar 29, 2024 · 4 comments · May be fixed by #926

Comments

@acgoldma
Copy link

# lspci | grep Mel
10000:01:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
# mstconfig q
-E- Failed to open the device
# mstconfig -d 10000:01:00.0 q
-E- Failed to open the device

Quick check of the code shows that the domain is only stored in a 16-bit value, which it can be larger than. I would suggest expand the fields to 32 bit at least.

@acgoldma acgoldma linked a pull request Mar 29, 2024 that will close this issue
@ogalbxela
Copy link
Contributor

Isn't 10000 fit in 16 bits?
Making the change you suggested in #926 might hide the actual problem. We'll look into it further internally.

In the meantime, could you please let us know the Linux distribution and version of the lspci utility?

@acgoldma
Copy link
Author

acgoldma commented Apr 5, 2024

The output for the domain is in HEX so that is 0x10000 which is greater than 16-bits.

This system is stock RHEL 8.8.

# uname -r
4.18.0-477.10.1.el8_8.x86_64
# lspci --version
lspci version 3.7.0
# dmidecode -s system-product-name
S2600WFT
# lscpu | grep "^Model name"
Model name:          Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz

One thing to note was that I was able to reset BIOS settings. This moved the device back to 0000 domain. From comparing the before and after dump of the BIOS, the possible BIOS setting might be:

[BIOS::Advanced::PCI Configuration::Volume Management Device]
Riser1, Slot1 Volume Management Device(CPU1, IOU1)=Enabled ;Options: Disabled=00: Enabled=01
...

While no longer blocking use for us, the issue remains as this is a possible config.

Also, Installing MOFED had similar issues with the a few more tools like FW update.

@ogalbxela
Copy link
Contributor

ogalbxela commented Apr 6, 2024

pciutils/pciutils@ab61451

May, 2016 :-)

We'll fix it in MFT codebase then propagate here. Thank you!

@ogalbxela
Copy link
Contributor

ogalbxela commented Aug 6, 2024

I have informed the manager of the relevant team about the issue raised here. It is now pending his decision.

If the decision is made to address this concern, then, per our process, it will first be incorporated into the internal MFT tools codebase (including requirements, design, implementation, and a full testing cycle). Then, it will be propagated to the opensource part of the MFT tools, which MSTFLINT is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants