-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QEMUv7: xtest pkcs11_1007 sporadic failure #6952
Comments
I was able to reproduce the issue. Traces indeed seen show a "host memory" allocation failure, while we would expect that this test exhausts the "device memory" (aka OP-TEE core or TA memory) .
They relate to pkcs11_1000.c source file tests: line 1769: for (n = 0; n < ARRAY_SIZE(sessions); n++) {
...
line 1783 }
...
line 1791: /* Closing session with out bound and invalid IDs (or negative ID) */
line 1792: rv = C_CloseSession(sessions[n - 1] + 1024);
line 1793: ADBG_EXPECT_CK_RESULT(c, CKR_SESSION_HANDLE_INVALID, rv);
line 1794: rv = C_CloseSession(CK_INVALID_HANDLE);
line 1795: ADBG_EXPECT_CK_RESULT(c, CKR_SESSION_HANDLE_INVALID, rv);
line 1796: rv = C_CloseSession(~0);
line 1797: ADBG_EXPECT_CK_RESULT(c, CKR_SESSION_HANDLE_INVALID, rv); I saw also some unexpected error traces from OP-TEE core that are emitted in the loop (that attempts to open more than a hundred sessions) before line 1792 above is called, so I think the issue is deeper than a libckteec (PKCS#11 crypoki client lib) issue:
When the issue occurs, that loop stops on a "host memory" error, whereas when pkcs11_1007 test succeeds (at it most often does), the loop stops on a "device memory" error, as expected. I'll continue to investigate. I think they are some Linux OS resources that are not properly released in xtest. |
This error suggests that the normal world uses a non-secure address that OP-TEE isn't aware of. |
I looks like client is invoking OP-TEE with an shm buffer that failed to be registered, but I can't find where this can happen. Regarding pkcs11_1007 sporadic failures, i think it comes from that fact we try to invoke OP-TEE for closing invalid session handlers (lines 1792 to 1797 shown in the comment above) while OP-TEE core heap memory is already exhausted. I'll propose a change to release some resources before we use make these attempts. |
Close opened session after secure service memory is exhausted before testing closure on invalid session. This change attempts to fix a non systematic issue found as described in [1]. A test case is slightly changed: where a likely invalid session ID (valid ID + 1024) was tried to be closed, this change now tries to close a session that has been already closed. Link: OP-TEE/optee_os#6952 [1] Signed-off-by: Etienne Carriere <[email protected]>
Close opened session after secure service memory is exhausted before testing closure on invalid session. This change attempts to fix a non systematic issue found as described in [1]. A test case is slightly changed: where a likely invalid session ID (valid ID + 1024) was tried to be closed, this change now tries to close a session that has been already closed. Link: OP-TEE/optee_os#6952 [1] Signed-off-by: Etienne Carriere <[email protected]> Acked-by: Jens Wiklander <[email protected]>
Close opened sessions after secure service memory is exhausted before testing closure on invalid session. This change attempts to fix a non systematic issue found as described in [1]. A test case is slightly changed: where a likely invalid session ID (valid ID + 1024) was tried to be closed, this change now tries to close a session that has been already closed. Link: OP-TEE/optee_os#6952 [1] Signed-off-by: Etienne Carriere <[email protected]> Acked-by: Jens Wiklander <[email protected]>
Close opened sessions after secure service memory is exhausted before testing closure on invalid session. This change attempts to fix a non systematic issue found as described in [1]. A test case is slightly changed: where a likely invalid session ID (valid ID + 1024) was tried to be closed, this change now tries to close a session that has been already closed. Link: OP-TEE/optee_os#6952 [1] Signed-off-by: Etienne Carriere <[email protected]> Acked-by: Jens Wiklander <[email protected]>
This issue has been marked as a stale issue because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment, otherwise this issue will automatically be closed in 5 days. Note, that you can always re-open a closed issue at any time. |
FYI, P-R OP-TEE/optee_test#758 attempts to address the issue but it's not obvious to me it will really fix it. I'll let this Issue be closed is no such failure is reported by new P-Rs' CI tests. |
Log: Reproduced today on https://github.com/OP-TEE/optee_os/pull/7014. However, I resubmitted my patches so I lost the CI logs :/ |
Still visible here: https://github.com/OP-TEE/optee_os/actions/runs/11071874169/job/30764736296 |
Thanks for reporting the issue. So it was not addressed. In the mean time, to not bother contributors with this sporadic issue, @jforissier would you be okay to specifically disable xtest pkcs11_1007. It does not cover any pkcs11 crypto or login tests, only session open/close specific cases. |
We could make the CI show the full console output of the container (
Fine with me. Adding |
Exclude xtest case pkcs11_1007 from qemuv7 CI tests since we're facing sporadic failures on this test on PKCS#11 sessions opening and release. Occurrences of this issue have been found only on this Armv7-A platform. Once the issue is solved, we be able to restore this test. Link: OP-TEE#6952 Signed-off-by: Etienne Carriere <[email protected]>
Created #7061. |
I don't recall seeing this failure on other platforms than qemuv7 indeed. Let's add QEMUv7 to the title. |
Exclude xtest case pkcs11_1007 from qemuv7 CI tests since we're facing sporadic failures on this test on PKCS#11 sessions opening and release. Occurrences of this issue have been found only on this Armv7-A platform. Once the issue is solved, we be able to restore this test. Link: OP-TEE#6952 Signed-off-by: Etienne Carriere <[email protected]> Acked-by: Jens Wiklander <[email protected]> Acked-by: Jerome Forissier <[email protected]>
Exclude xtest case pkcs11_1007 from qemuv7 CI tests since we're facing sporadic failures on this test on PKCS#11 sessions opening and release. Occurrences of this issue have been found only on this Armv7-A platform. Once the issue is solved, we be able to restore this test. Link: #6952 Signed-off-by: Etienne Carriere <[email protected]> Acked-by: Jens Wiklander <[email protected]> Acked-by: Jerome Forissier <[email protected]>
This issue has been marked as a stale issue because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment, otherwise this issue will automatically be closed in 5 days. Note, that you can always re-open a closed issue at any time. |
The issue seems to have disappeared. However, I suggest we keep this entry opened, at least until next release tag (jan'25). |
If I'm not mistaken, 1007 has been disabled in CI, so what makes you think that the issue has disappeared? |
;) no test, no error! |
This issue has been marked as a stale issue because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment, otherwise this issue will automatically be closed in 5 days. Note, that you can always re-open a closed issue at any time. |
Exclude xtest case pkcs11_1007 from qemuv7 CI tests since we're facing sporadic failures on this test on PKCS#11 sessions opening and release. Occurrences of this issue have been found only on this Armv7-A platform. Once the issue is solved, we be able to restore this test. Link: OP-TEE#6952 Signed-off-by: Etienne Carriere <[email protected]> Acked-by: Jens Wiklander <[email protected]> Acked-by: Jerome Forissier <[email protected]>
xtest pkcs11_1007
(requires to build withCFG_PKCS11_TA=y
) sporadically fails with unexpected error codeCKR_HOST_MEMORY
on various platforms, including qemu_virt, qeu_armv8a, stm32mp1 (for I one i've tests).References found in Github optee_os forum:
#5395 (comment) (July 2022, maybe unrelated)
#5359 (comment) (May 2023)
#6125 (comment) (June 2023)
#6922 (comment) (June 2024)
#6881 (comment) (July 2024)
I open this Issue to track it. I've started to look closer at this and will post my current findings (that are still weak).
The text was updated successfully, but these errors were encountered: