Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Debugging ARM Cortex-M0+ Hard Faults

DZone's Guide to

Debugging ARM Cortex-M0+ Hard Faults

If you're hittingg hard faults with your ARM Cortex-M0+, it could be a problem with unaligned flash blocks. Eliminate your variables and narrow your scope when debugging.

· IoT Zone
Free Resource

To me, one of the most frustrating things working with ARM Cortex-M cores are the hard fault exceptions. I have lost several hours this week debugging and tracking an instance of a hard fault on an ARM Cortex-M0+ device.

For background, I’m porting a project (NXP Kinetis KW40Z160) to Eclipse and the GNU tool chain for ARM. The application seems to run fine after downloading it with the debugger, but it crashes with a hard fault if either I do a ‘restart’ with the debugger or if I do reset the microcontroller with a SYSRESETREQ (see How to Reset an ARM Cortex-M with Software).

Interestingly, it crashed during startup in the ANSI library, in the _init() function:

Next assembly step will cause a hard fault

The next assembly step will cause a hard fault

Pushing the registers in _init() will cause the hard fault exception:

triggered hard fault

Triggered hard fault

Interestingly, that same code with the same registers/stack/etc. works fine the first time, but it breaks down later, after the application is running.

The step was to check the usual suspects:

  • Checking the output of the hard fault handler? It didn't show anything usable.
  • Stack pointer not aligned? No, that’s fine, and it works with the same values the first time.
  • ARM/Thumb mode? Nope, all in Thumb mode.
  • Stack readable/writable? Yes, and no special protection or other kinds of things were used.
  • Interrupt priorities? Nope, it happens with interrupts disabled.
  • Is it a problem with the debugger? Nope, tried both Segger and P&E, same for both, and it happens with and without a debugger attached or used.

Well, that’s the point where I ran out of ideas. What remains is the desperate attempt to ask colleagues, which returned a similar list of points as above. So no progress.

Internet Help?

Next: search the Internet. Lots of other people have issues with hard faults, too. At least this revealed an interesting article about imprecise hard faults. Not only was it very interesting reading, but it helped me start digging deeper. Unfortunately, it only applies to Cortex-M3/M4 and not to the M0+. Anyway, I have added an option to the hard fault handler so I can deal with this better in other projects:

Hard Fault Handler with option to disable write buffer

Hard Fault Handler with the option to disable the write buffer

I still had no clue. What I ended up trying was to do a ‘binary’ search: disabling portions of the application to find out if there is something causing the problem. Yes, it's time-consuming, but that was my only option. So I turned on/off parts of the application to find out what part of the application was causing the problem.

And indeed, after hours of searching, I narrowed it down to the flash programming: The application is storing data internally by reprogramming a 1 KByte flash page, and for this, it erases a 1K block in flash. In the linker file, it was allocated like this:

  .nvm :
  {
    . = ALIGN(8);
    NV_STORAGE_START_ADDRESS = .;
    . += 1024;
    NV_STORAGE_END_ADDRESS = .;
  } > m_text


At first, erasing it seemed to work fine, but then the microcontroller would crash after a restart.

Looking closer at the linker file, I finally spotted the problem: It should have been aligned to the 1k Flash block size:

  .nvm :
  {
    . = ALIGN(1024); /* must be 1k aligned! */
    NV_STORAGE_START_ADDRESS = .;
    . += 1024;
    NV_STORAGE_END_ADDRESS = .;
  } > m_text


With this change, the problem went away

Summary

Dealing with hard faults on ARM is not easy. This particular one was caused by erasing a flash memory block that was not aligned. It seems to me that somehow the internals of the processor got screwed up when that happened. The challenge was that it then crashed in a way not very closely related to the cause of the problem. These kinds of problems are not easy to find and solve.

What usually works best, and what worked in this case, is trying to reduce the problem: Have a ‘working’ code and have a ‘failing’ version. Try to eliminate all variables (board, debugger, power supply, host machine) and then reduce the problem to narrow it down as much as possible.  In any case, I hope this post can help others. And yes, some luck helps accelerate that process.

Happy debugging!

Links

Topics:
arm cortex-m ,debugging ,iot ,hard fault

Published at DZone with permission of Erich Styger, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}