How to Optimize Code and RAM Size
A tutorial for optimizing code and RAM size.
Join the DZone community and get the full member experience.
Join For FreeIt is great if vendors provide a starting point for my own projects. A working ‘blinky’ is always a great starter. Convenience always has a price, and with a ‘blinky’ it is that the code size for just ‘toggling a GPIO pin’ is exaggerated. For a device with a tiny amount of RAM and FLASH this can be concerning: will my application ever fit to that device if a ‘blinky’ takes that much? Don’t worry: a blinky (or any other project) can be easily trimmed down.
Binky on NXP LPC845-BRK Board
I use a ‘blinky’ project here just as an example: the trimming tips can apply to any other kind of project too.
For this tutorial I’m using the NXP LPC845 on the BRK (breakout) board:
NXP LPC845-BRK Board
Blinky
I’m using the Eclipse-based NXP MCUXpresso IDE:
SDK board selection
REPORT THIS AD
I have created the ‘blinky’ project with the vendor default settings:
blinky
A ‘blinky’ is supposed to blink a LED, just a good starter for any project. Building that rather minimal project gives this as code size:
Memory region Used Size Region Size %age Used
PROGRAM_FLASH: 10536 B 64 KB 16.08%
SRAM: 2424 B 16 KB 14.79%
That information is shown in the console that way too, divided up in text, data and bss:
text data bss dec hexfilename
10532 4 2420 12956 329clpc845breakout_led_blinky.axf
10K for a blinky looks exaggerated. But we are going to trim this now in the next steps.
Size Information
For the meaning of the size information, have a read at “text, data and bss: Code and Data Size Explained“. The normal way to see what is using space on my device is to check the linker map file (*.map):
Linker Map File
REPORT THIS AD
But that map file is rather hard to read and more for the experts: it lists the sections with the address and size:
Linker Map File Content
With the MCUXpresso IDE V11, there is a nice ‘Image Info’ view which is basically a better viewer for the map file information:
Image Info View
I can filter and sort the data which gives me an idea of how much space is used for code and data:
Image Info Memory Content
REPORT THIS AD
Of course, it requires some knowledge about what the application is supposed to do. I always go through that list of items in the view to see if there is anything there I would not expect: maybe the application is using something which can be removed.
Source Code
For a simple blinky, that is rather not small. The first thing is to check what the program is doing. The main.c has this:
/*
* Copyright 2017 NXP
* All rights reserved.
*
* SPDX-License-Identifier: BSD-3-Clause
*/
#include "board.h"
#include "fsl_gpio.h"
#include "pin_mux.h"
/*******************************************************************************
* Definitions
******************************************************************************/
#define BOARD_LED_PORT 1U
#define BOARD_LED_PIN 2U
/*******************************************************************************
* Prototypes
******************************************************************************/
/*******************************************************************************
* Variables
******************************************************************************/
volatile uint32_t g_systickCounter;
/*******************************************************************************
* Code
******************************************************************************/
void SysTick_Handler(void)
{
if (g_systickCounter != 0U)
{
g_systickCounter--;
}
}
void SysTick_DelayTicks(uint32_t n)
{
g_systickCounter = n;
while (g_systickCounter != 0U)
{
}
}
/*!
* @brief Main function
*/
int main(void)
{
/* Define the init structure for the output LED pin*/
gpio_pin_config_t led_config = {
kGPIO_DigitalOutput,
0,
};
/* Board pin init */
BOARD_InitPins();
BOARD_InitBootClocks();
BOARD_InitDebugConsole();
/* Init output LED GPIO. */
GPIO_PortInit(GPIO, BOARD_LED_PORT);
GPIO_PinInit(GPIO, BOARD_LED_PORT, BOARD_LED_PIN, &led_config);
/* Set systick reload value to generate 1ms interrupt */
if (SysTick_Config(SystemCoreClock / 1000U))
{
while (1)
{
}
}
while (1)
{
/* Delay 1000 ms */
SysTick_DelayTicks(1000U);
GPIO_PortToggle(GPIO, BOARD_LED_PORT, 1u << BOARD_LED_PIN);
}
}
REPORT THIS AD
Basically, the code is initializing the pins, clocks, sets up the SysTick timer and then does the ‘blinky’ in a loop, using the Systick counter to delay the blink period.
Debug Console
But what I can see is that it initializes a debug console (and the UART hardware for it):
BOARD_InitDebugConsole();
Getting rid of that gets us down to:
Memory region Used Size Region Size %age Used
PROGRAM_FLASH: 5616 B 64 KB 8.57%
SRAM: 2400 B 16 KB 14.65%
Look for functions which get called but not used. In many cases demo applications setup some communication channels, but then they are not used. The linker does a good job removing unused objects (functions/variables), but only if they are not referenced.
Semihosting and Printf()
The next thing to look at is if there is any semihosting or printf(). The project is using the ‘Redlib’ which is an optimized library compared to the ‘standard’ newlib or the smaller-standard newlib-nano:
Redlib
Still, that library might add-up to the code size because it is using semihosting (sending messages through the debugger). Looking at the Memory view I can see all these standard I/O functions needed for that directly or indirectly:
Stdio Functions
REPORT THIS AD
Having all the hooks for that functionality only makes sense if using it, and this is not used by the ‘blinky’. So getting rid of that semihosting and all the unused standard I/O means to use the ‘none’ variant:
Library without standard I/O
This gets us down to this:
Memory region Used Size Region Size %age Used
PROGRAM_FLASH: 3372 B 64 KB 5.15%
SRAM: 2208 B 16 KB 13.48%
avoid using printf() and all its variants, including semihosting. Or use a smaller variant or implementation. See the links at the end of this article for more background on this.
DEBUG and NDEBUG
The next thing is to check the compiler defines if they have the DEBUG listed. And indeed, this is the case:
DEBUG Define
REPORT THIS AD
With that define set, there is a lot of extra code in the SDK and example drivers which checks for good values with the ‘assert()’ macro:
Assert() Usage in SDK Code
Here again, the Image information view is helpful: it shows me all the places where assert() is used:
Assert Usage
It is actually a good practice to have asserts in the code to catch programming errors early. But all the assert() code really adds up. To turn off the extra code (and safety belt!), I change the macro to NDEBUG:
NDEBUG
REPORT THIS AD
This gets us down to this:
Memory region Used Size Region Size %age Used
PROGRAM_FLASH: 3144 B 64 KB 4.80%
SRAM: 2208 B 16 KB 13.48%
Interrupts and Vectors
Again the Image Info view is a good starting point. I’m checking the used interrupts. The Blinky is using the SysTick interrupt which is expected. But there are still UART interrupts used?
Interrupts Used
Most interrupts are implemented as ‘weak’: implemented as default/empty, which can be overwritten by the application. But the UART ones do not make sense, as the blinky is not using any UART communication?
It turns out that the NXP SDK has the UART transactional API turned on by default:
UART Transactional API setting
REPORT THIS AD
The transactional API allows to send/receive UART data in communication chunks/transactions. But we don’t need that in our blinky, so let’s turn it off:
Turning Off UART Transactional API
Which gives:
Memory region Used Size Region Size %age Used
PROGRAM_FLASH: 2964 B 64 KB 4.52%
SRAM: 2184 B 16 KB 13.33%
There would be now the option to remove CMSIS support which adds up about 300 bytes to the above code. But I consider that CMSIS (setting interrupt priority, common clock settings) as very useful, so I don’t touch it here. The largest function in the application is the one used by the SysTick code to set the priority of the timer to the lowest priority which would save another 220 bytes:
CMSIS as largest single function code size contributor
Optimizations
So far I have stripped off unwanted or unused functionality. Next, I could turn on compiler optimizations. By default, the project is setup to -O0:
Compiler Optimizations
REPORT THIS AD
-O0 means no optimization: code is straight forward and easy to debug.
-O1 mainly optimizes the function entry/exit code and is able to reduce code size a bit without really impacting debugging. In this example it cuts down code size by half!
Memory region Used Size Region Size %age Used
PROGRAM_FLASH: 1540 B 64 KB 2.35%
SRAM: 2184 B 16 KB 13.33%
-O2 optimizes more and tries to keep things in registers as much as possible. Because the functions in the applications are rather small, the improvement is not that big:
Memory region Used Size Region Size %age Used
PROGRAM_FLASH: 1516 B 64 KB 2.31%
SRAM: 2184 B 16 KB 13.33%
-O3 optimizes the most with extra inlining. -O3 is targeting speed, so no wonder the code size increases again:
Memory region Used Size Region Size %age Used
PROGRAM_FLASH: 1792 B 64 KB 2.73%
SRAM: 2184 B 16 KB 13.33%
The best option for code size optimization is -Os (optimize for size):
Memory region Used Size Region Size %age Used
PROGRAM_FLASH: 1456 B 64 KB 2.22%
SRAM: 2184 B 16 KB 13.33%
That looks now pretty reasonable! Of course, there are now ways to cut off more for a ‘bare-bare-blinky’, but everything in place (startup code, clock and GPIO initialization) makes sense for a real application, so I stop here now.
RAM: Heap and Stack
What does not look right is the SRAM usage. The ‘heap’ is using a big chunk:
heap memory usage
REPORT THIS AD
That heap is used for dynamic memory allocation (malloc()). The general rule for embedded programming is to avoid it. But it is here by default. It can be turned off in the linker settings: The demo uses 1K for heap and stack each. As I’m not using malloc(), I can set the heap size to 0x0. For the reserved stack that really depends on the applications. On ARM Cortex the MSP is used for the startup/main and for the interrupts (see “ARM Cortex-M Interrupts and FreeRTOS“). 0x100 (256 bytes) should be plenty for my blinky.
Heap and Stack Size
This gets me down to this:
Memory region Used Size Region Size %age Used
PROGRAM_FLASH: 1456 B 64 KB 2.22%
SRAM: 392 B 16 KB 2.39%
If it is about reducing the stack size further, I can look at the Call Graph information which gives me information about how much stack space is used:
Call Graph with Stack Size
REPORT THIS AD
There are a few items with unknown size information (marked with a ‘?’) because they are in the library. A way to verify the real stack usage would be to write a pattern (e.g. 0xffff’ffff) and then run the application for a while:
Used Stack
This shows that 72 bytes are actually used. With a bit of a margin, setting the stack size to 128 bytes, in this case, looks reasonable. This gives:
Memory region Used Size Region Size %age Used
PROGRAM_FLASH: 1456 B 64 KB 2.22%
SRAM: 264 B 16 KB 1.61%
Be really careful with this! Stack overflows are the probably the most common problem in embedded applications. If you can, give as much RAM you can spend for the stack. If cutting the size down, make sure you did enough analysis to justify your stack size.
MTB
There is one thing left that uses RAM space: the MTB buffer. The Micro Trace Buffer is used for tracing which can be very useful (see “Debugging ARM Cortex-M0+ Hard Fault with MTB Trace“). The buffer can be disabled with a macro:
mtb.c
__MTB_DISABLE
Which gets me down to this:
Memory region Used Size Region Size %age Used
PROGRAM_FLASH: 1456 B 64 KB 2.22%
SRAM: 136 B 16 KB 0.83%
I think here we can be happy
Summary
Vendor examples are great: they give me a good starting point. They are not optimized, and this is intentional. But they might come with features and functions I don’t need. Knowing different ways to optimize the application by cutting off features or tuning settings can be very useful to optimize RAM and FLASH usage. In this tutorial, I showed how to bring a ‘blinky’ down to around 1KB Flash and around 136 bytes of SRAM. Of course, this all depends on features and usage, but I think this is a pretty reasonable state now to add extra functionality for my application.
I hope these tips might be useful for your projects.
Published at DZone with permission of Erich Styger, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments