Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HardFault_Handler() collects full MPU context #201

Merged
merged 2 commits into from
Sep 21, 2021
Merged

HardFault_Handler() collects full MPU context #201

merged 2 commits into from
Sep 21, 2021

Conversation

marcoaccame
Copy link
Contributor

@marcoaccame marcoaccame commented Sep 16, 2021

With this PR we have the HardFault_Handler() collect and save in non volatile RAM the full context of the MPU, so that at the restart the board sends this information over diagnostic messages.

This behaviour applies to ems, mc4plus and mc2plus.

There is an associated PR for the binaries in robotology/icub-firmware-build#35

Behaviour of the improved diagnostics

The HardFault_Handler() now sends improved information which contains:

  • execution registers at the moment of the exception such as PC (program counter) and others which help understand which instruction caused the fault
  • system registers such as CFSR and others which tell more about the cause of the fault.

Here is an example of what the board sends when we induce a HF by some sample code which access not existing RAM.

extern void eo_fatalerror_AtStartup(EOtheFatalError *p)
{
...
    volatile unsigned int* pp;
    volatile unsigned int n;
    pp = (unsigned int*)0xCCCCCCCC;
    n = *pp;

Listing. Offending code

[INFO] (EOtheServices tsk2 @S5:m632:u625)-> {0x3b, p16 0x0000, p64 0x0000000000000000, dev 0, adr 0}: SYS: the board is bootstrapping.
[ERROR] (EOtheFatalError tsk2 @S5:m634:u218)-> {0x4000000 p16 0x0000, p64 0x0203000600001604, dev 0, adr 0}: DEBUG: tag00. INFO = RESTARTED after FATAL error
[ERROR] (EOtheFatalError tsk2 @S5:m634:u344)-> {0x4000000 p16 0x0000, p64 0x0203000600001604, dev 0, adr 0}: DEBUG: tag00. INFO = @ 5636 ms
[ERROR] (EOtheFatalError tsk2 @S5:m634:u469)-> {0x4000000 p16 0x0000, p64 0x0203000600001604, dev 0, adr 0}: DEBUG: tag00. INFO = handler hw_HardFault, code 0x0
[ERROR] (EOtheFatalError tsk2 @S5:m634:u596)-> {0x4000000 p16 0x0000, p64 0x0203000600001604, dev 0, adr 0}: DEBUG: tag00. INFO = type see TBL
[ERROR] (EOtheFatalError tsk2 @S5:m634:u723)-> {0x4000000 p16 0x0000, p64 0x0203000600001604, dev 0, adr 0}: DEBUG: tag00. INFO = IRQHan HardFault Thread tINIT
[ERROR] (EOtheFatalError tsk2 @S5:m634:u852)-> {0x4000000 p16 0x0000, p64 0x0203000600001604, dev 0, adr 0}: DEBUG: tag00. INFO = ipsr 3, tid 2
[ERROR] (EOtheFatalError tsk2 @S5:m634:u973)-> {0x4000000 p16 0x0000, p64 0x0203000600001604, dev 0, adr 0}: DEBUG: tag00. INFO = CFSR 0x8200
[ERROR] (EOtheFatalError tsk2 @S5:m635:u89)-> {0x4000000 p16 0x0000, p64 0x0203000600001604, dev 0, adr 0}: DEBUG: tag00. INFO = MORE INFO
[ERROR] (EOtheFatalError tsk2 @S5:m635:u216)-> {0x4000000 p16 0x0000, p64 0x0203000600001604, dev 0, adr 0}: DEBUG: tag00. INFO = ICSR = 0x00000803 SHCSR = 0x00000000
[ERROR] (EOtheFatalError tsk2 @S5:m635:u354)-> {0x4000000 p16 0x0000, p64 0x0203000600001604, dev 0, adr 0}: DEBUG: tag00. INFO = CFSR = 0x00008200 HFSR = 0x40000000
[ERROR] (EOtheFatalError tsk2 @S5:m635:u492)-> {0x4000000 p16 0x0000, p64 0x0203000600001604, dev 0, adr 0}: DEBUG: tag00. INFO = DFSR = 0x00000000 MMFAR = 0xcccccccc
[ERROR] (EOtheFatalError tsk2 @S5:m635:u629)-> {0x4000000 p16 0x0000, p64 0x0203000600001604, dev 0, adr 0}: DEBUG: tag00. INFO = BFAR = 0xcccccccc AFSR = 0x00000000
[ERROR] (EOtheFatalError tsk2 @S5:m635:u765)-> {0x4000000 p16 0x0000, p64 0x0203000600001604, dev 0, adr 0}: DEBUG: tag00. INFO = r0 = 0xcccccccc r1 = 0x00000000
[ERROR] (EOtheFatalError tsk2 @S5:m635:u899)-> {0x4000000 p16 0x0000, p64 0x0203000600001604, dev 0, adr 0}: DEBUG: tag00. INFO = r2 = 0x00000000 r3 = 0x00000000
[ERROR] (EOtheFatalError tsk2 @S5:m636:u36)-> {0x4000000 p16 0x0000, p64 0x0203000600001604, dev 0, adr 0}: DEBUG: tag00. INFO = r12 = 0x00000000 lr = 0x08048ab9
[ERROR] (EOtheFatalError tsk2 @S5:m636:u171)-> {0x4000000 p16 0x0000, p64 0x0203000600001604, dev 0, adr 0}: DEBUG: tag00. INFO = pc = 0x0802f87a psr = 0x41000000

Listing. Diagnostics messages

The interpretation of the diagnostic messages is not trivial. We need to refer to the ARM Cortex M4 documentation (see here) and we also need the .map file of the running application for mapping the hex addresses of the program counter for instance to the assembly and C code. We very likely will also need to re-run the very same project on the debugger. As a fact, post mortem diagnosis w/out a debugger is very complicate.

As an example for the above situation:

  • with the .map we can locate the pc = 0x0802f87a to the offending function. in this case it is inside eo_fatalerror_AtStartup() as expected

image

  • however, only w/ the debugger running we can find the offending instruction inside the C function
    image

  • by looking at the system registers we can for instance see that address 0xcccccccc is the one giving problems.

Tests
The code was tested on an ems on a dedicated testbench where we caused several HF faults such as divide by zero, access to not-existing addresses but also others such as missing memory for RTOS.

The binaries of the mc4plus have been validated also with tests on icub3 (see here), hence we can merge the PR

Note
This PR addresses the problem in #174

…ontext on fatal error

advanced application versions for ems, mc4plus, mc2plus
@traversaro
Copy link
Member

fyi @S-Dafarra @isorrentino @paolo-viceconte

@marcoaccame
Copy link
Contributor Author

The binaries of the mc4plus have been validated also with tests on icub3 (see here), hence we can merge the PR

@marcoaccame marcoaccame merged commit 302b88f into robotology:devel Sep 21, 2021
marcoaccame added a commit to robotology/icub-firmware-build that referenced this pull request Sep 21, 2021
also added the related .map files which tell mapping of assembly instructions
and functions into the code space. these files can be used to locate the hex
adddresses contained in the diagnostics

- built:
  - ems v3.43, build 15 sept 2021 @ 14:37
  - mc4plus v3.37, build 15 sept 2021 @ 14:44
  - mc2plus v3.27, build 15 sept 2021 @ 14:46
- main changes from previous release are:
  - the fatal error handler whch forces the the restart sends diagnostics w/ complete mpu stack
    and registers info to yarprobotinterface (see robotology/icub-firmware#201).
	these diagnostics are to be decoded w/ relevant .map files.
@marcoaccame marcoaccame deleted the feat/mpucontext-on-fatalerror branch January 11, 2022 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants