-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uncontrolled state spotted on joints controlled via MC4Plus #673
Comments
Also there is a super annoying whistle coming from the hand. If we squeeze the fingers (moving the |
Hi @S-Dafarra , still on going this problem?!?! |
Yes, actually it happens on both arms. For example, yesterday it happened on the right arm. Interestingly we also pressed the fault, but the forearm remained in that strange state. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
This issue is still occurring. |
It just happened today, again on the left arm. As far as I know, this is still happening on both arms. After a quick F2F talk with @julijenv we suspect this issue may be related to a board resetting, maybe after a voltage drop, which causes the board to remain in an uncontrolled state with a constant PWM output. |
Hi @valegagge , |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
The issue is still occurring. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
It still happens. |
After a quick f2f with @julijenv we understood that we need to check whether the board is still pingable when this problem is occurring. This may give us some more hints of what is going on. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
I am not going to give up on this |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
This issue has been automatically closed due to inactivity. Feel free to open it again if needed. |
It can be. The value in RT5 is obtained by de-referencing the address |
The PR is ready. We need only to test the binary on our setup before we can release it for the robot. |
Hi @S-Dafarra, a new version (v 3.40) for the binary of the Pls use it and collect logs. Thanks, |
Hi @marcoaccame, yesterday I flashed the new firmware version. |
Here a log of the error that happened on one of the neck boards: |
Quite simply Brief explanation FATALERR_RT2_set(FT_0, 2);
FATALERR_RT2_set(FT_2, rtosobj);
// any caller
oosiit_mbx_retrieve(rtosobj, &p, s_osal_timeout2tick(tout)); Then execution goes through here, where extern oosiit_result_t oosiit_mbx_retrieve(oosiit_objptr_t mailbox,
void** message, uint32_t timeout)
{
if(oosiit_res_NOK == s_oosiit_mbx_valid(mailbox))
{
return(oosiit_res_NOK);
}
if(NULL == message)
{
return(oosiit_res_NOK);
}
if(0 != __get_IPSR())
{ // inside isr
FATALERR_RT2_set(FT_0, 31); // unlikely path
return(isr_oosiit_mbx_retrieve(mailbox, message));
}
else if(1 == s_oosiit_started)
{ // call svc
FATALERR_RT2_set(FT_0, 30); // detected path
return(__svc_oosiit_mbx_retrieve(mailbox, message, timeout)); In here we are inside ethe SVC handler (we use a smaller stack also shared w/ the other IRQ handlers) and in here we call extern oosiit_result_t svc_oosiit_mbx_retrieve(oosiit_objptr_t mailbox,
void** message, uint32_t timeout)
{
OS_RESULT res;
rt_iit_dbg_svc_enter();
res = iitchanged_rt_mbx_wait(mailbox, message, timeout);
rt_iit_dbg_svc_exit();
return((oosiit_result_t)res);
} And finally inside the crashing function // in here we just use TIME_t for type of timeout and use iitchanged_rt_block()
OS_RESULT iitchanged_rt_mbx_wait (OS_ID mailbox, void **message, TIME_t timeout) {
/* Receive a message; possibly wait for it */
P_MCB p_MCB = mailbox;
P_TCB p_TCB;
FATALERR_RT2_set(FT_0, 3);
FATALERR_RT2_set(FT_3, p_MCB);
FATALERR_RT2_set(FT_4, message); Why does it happen? What to do . |
Hi @S-Dafarra, a new version (v 3.41) for the binary of the As usual it is named Pls use it and collect logs. Thanks, |
With @ale-git we flashed the FW yesterday. Today, one of the wrist boards had a fatal error even before the end of the calibration. Here the log: |
Just to make sure that we're aligned. |
Yes we flashed both the EMS and the MC4Plus |
Thanks! cc @marcoaccame |
I have analyzed the new log and ...
I omit the full analysis and I will focus only the new findings and on the actions. New findings So, I have looked for any possible cause of corruption in stack / argument values in the SVC calls. In some posts I have found that this may happen if the priorities of the SVC, SysTick and PostPend handler used by the RTOS are not correctly set. In such a case, if for instance the SVC is preempted then it could use a different stack and something could fail. That is mentioned for instance in here, but also more importantly in an ARM's application note. In https://developer.arm.com/documentation/ka003146/latest they say that that to modify the NVIC priority grouping during runtime may cause crashes.
I remember that the wanted original design of the firmware is to perform the assignment of NVIC priority grouping only once at startup and before the start of the RTOS. extern hal_result_t hal_sys_init(const hal_sys_cfg_t* cfg)
{
... omissis
// set priority levels
// configure once and only once the nvic to hold 4 bits for interrupt priorities and 0 for subpriorities
// in stm32 lib ... NVIC_PriorityGroup_4 is 0x300, thus cmsis priority group number 3, thus
// bits[7:4] for pre-emption priority and bits[3:0} for subpriority. but stm32 only has the 4 msb bits.
// see page 114 of joseph yiu's book.
NVIC_PriorityGroupConfig(NVIC_PriorityGroup_4);
return(hal_res_OK);
} Code Listing. Initialization code of HAL called at startup which contains initialization of NVIC priorities (see https://github.com/robotology/icub-firmware/blob/master/emBODY/eBcode/arch-arm/libs/highlevel/abslayer/hal2/src/core/hal_sys.c#L151-L157). But a double check on that has shown that there is a further change of NVIC priority grouping which is done in runtime after the RTOS initialization by the low level driver of the PWM. This is done only in the mc4plus and not in the ems and that may explain also why we never saw the fatal error in the ems. extern hal_result_t hal_motor_init(hal_motor_t id, const hal_motor_cfg_t *cfg)
{
if(hal_true == s_hal_motor_none_supported_is())
{
return(hal_res_NOK_generic);
}
... omissis
/* Configure one bit for preemption priority */
NVIC_PriorityGroupConfig(NVIC_PriorityGroup_2);
... omissis
return(hal_res_OK);
} Code Listing. Initialization code of the PWM driver which only the mc4plus calls at runtime at startup of the MC service (see https://github.com/robotology/icub-firmware/blob/master/emBODY/eBcode/arch-arm/libs/highlevel/abslayer/hal2/src/extra/devices/hal_dc_motorctl.c#L361-L362). Further analysis of the code of the
In particular, I have found extensive documentation that if the Actions The runtime change of the NVIC priorities is surely wrong and never intended, so it must be removed. The application note on ARM's site suggests that it avoids possible crashes. I just hope that it will solve our long running problem. Also, the priorities of the IRQ handlers must be compatible w/ the use of an RTOS. So, I will remove the runtime call of Progress
|
@S-Dafarra: merged. new binaries available. |
Hi @S-Dafarra Do you have any news to share on this? cc @marcoaccame |
Hi @pattacini! We flashed the firmware of robotology/icub-firmware-build#42 about a week ago and we did not have any other issue since then. Fingers crossed 🤞 |
Hi @S-Dafarra So far so good on this? |
Yes, still nothing to report! |
We need to clean up the debug messages we inserted to inspect the problem. |
Cleanup done in robotology/icub-firmware-build#48. |
@DanielePucci ... is it ok, right? Or do we have a fatal error? |
we don't have fatal errors at the moment, we were discussing this issue to understand if it was useful to update the firmware on iCubgenova04 MC4Plus |
Description of the failure
The left arm goes in a state which is not down, i.e. the motors are not completely switched off, but yet the joints starting from the wrist prono-supination cannot be controlled and are kind of "compliant".
Detailed conditions and logs of the failure
logArmDown_part1.txt
logArmDown_part2.txt
Image of the arm:
I can move easily the prono-supination, while the wrist is stuck to the upper limit.
cc @julijenv
The text was updated successfully, but these errors were encountered: