Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Execute-Only Code With GNU and GCC

DZone's Guide to

Execute-Only Code With GNU and GCC

Want to learn how to create execute-only code with GNU and GCC? Click this post to learn how!

· IoT Zone ·
Free Resource

“There is no ‘S’ for Security in IoT” has some truth. With all the connected devices around us, security of code should be a concern for every developer. “Preventing Reverse Engineering: Enabling Flash Security” shows how to prevent external read-out of critical code from a device. What some microcontrollers have built in is yet another feature: Execute-Only-Sections or Execute-Only-Memory. What it means is that only instruction fetches are allowed in this area. No read access at all. Similar to ‘read-only’ or ‘execute-only,’ it means that code can be executed there, but no other access from that memory is allowed.

Locked Code

In this article, I describe the challenges for a toolchain like the GNU GCC and how to compile and link code for such an execute-only memory.

Execute Only Memory

With the complete flash read-out protection explained in “Preventing Reverse Engineering: Enabling Flash Security," the door to the memory completely closed. It is not possible to read from the device, e.g. for reverse-engineering or to change the firmware on it. The only way to get back access to the device is usually a full erase of the device memory which prevents reading out the content with external tools.

excute only memory

In some cases, it would be beneficial to update or load some code into the device, allowing the user to load his own code, program, or applet into your device. But, you don’t want to allow that code to get access to your secret code in that device.

For example, a company sells electricity meters with a secret way to measure and store the billing information. The company sells the meters to electricity companies that add their own communication stacks and software. For this use case, it would be necessary to protect the secret code that it cannot be read by any ‘untrusted’ code.

‘Execute-Only’ allows protecting areas in the firmware from read-out, as I only execute instructions in it but not reading the code area itself. This allows running untrusted code (e.g. loaded as ‘applet’). The applet still can use and call functions from the protected area (for example, to get the billing information), but the untrusted code cannot spy out the protected firmware.

For example, secret encryption/decryption routines can be placed in a protected execute-only area and still allow ‘untrusted’ code to call it. Because it can only be executed, it prevents the ‘untrusted’ code to know what is inside that protected area:

code calling protected firmware

To be clear: this is not a perfect protection, and depending on the hardware implementation (see https://community.arm.com/processors/b/blog/posts/what-is-execute-only-memory-xom) and efforts it might be still possible to do reverse engineering.

The typical implementation in the hardware is that only instruction fetches — no data fetches — are allowed in this area. If the architecture has a dedicated instruction and data bus, then, basically, the data bus is not connected to that memory. Interrupt execution and interrupt stack frames, as well caches, have to be properly designed in the hardware to prevent read-out of the protected areas (see Meltdown and Spectre).

Code With Embedded Data: Literal Pools and Jump Tables

The code in an execute-only area can only be executed, and there is no data access allowed to it. This can be a challenge with the ARM Cortex (thumb2) instruction set. This can be illustrated with the following example which should be placed into an execute-only section:

int SecretFunction(int i) {
  return i+0x1234567;
}


Looking at the disassembly (see “Creating Disassembly Listings with GNU Tools and Eclipse"), it shows the following:

Disassembly of section .text.SecretFunction:

00000000 <SecretFunction>:
0: b480 push {r7}
2: b083 sub sp, #12
4: af00 add r7, sp, #0
6: 6078 str r0, [r7, #4]
8: 687a ldr r2, [r7, #4]
a: 4b04 ldr r3, [pc, #16] ; (1c <SecretFunction+0x1c>)
c: 4413 add r3, r2
e: 4618 mov r0, r3
10: 370c adds r7, #12
12: 46bd mov sp, r7
14: f85d 7b04 ldr.w r7, [sp], #4
18: 4770 bx lr
1a: bf00 nop
1c: 01234567 .word 0x01234567


The interesting thing is the ldr r3 [pc,#16], which loads the 0x1234567 constant into the register R3. The constant is placed at the end of the function code and is loaded PC relative. This constant is called a "literal pool" and is an area in the code that is used to store constants.

The other use case where the compiler is putting data and data reads into the code is with jump tables, illustrated by the following example:

int SecretSwitch(int i) {
    switch(i) {
    case 0: return 0;
    case 1: return 1;
    case 2: return 2;
    case 3: return 3;
    case 4: return 4;
    case 5: return 5;
    case 6: return 6;
    default: return i;
    }
}


This produces the following:

00000038 <SecretSwitch>:
38: b480 push {r7}
3a: b083 sub sp, #12
3c: af00 add r7, sp, #0
3e: 6078 str r0, [r7, #4]
40: 687b ldr r3, [r7, #4]
42: 2b06 cmp r3, #6
44: d81e bhi.n 84 <SecretSwitch+0x4c>
46: a201 add r2, pc, #4 ; (adr r2, 4c <SecretSwitch+0x14>)
48: f852 f023 ldr.w pc, [r2, r3, lsl #2]
4c: 00000069 .word 0x00000069
4c: R_ARM_ABS32 .text_exec_only
50: 0000006d .word 0x0000006d
50: R_ARM_ABS32 .text_exec_only
54: 00000071 .word 0x00000071
54: R_ARM_ABS32 .text_exec_only
58: 00000075 .word 0x00000075
58: R_ARM_ABS32 .text_exec_only
5c: 00000079 .word 0x00000079
5c: R_ARM_ABS32 .text_exec_only
60: 0000007d .word 0x0000007d
60: R_ARM_ABS32 .text_exec_only
64: 00000081 .word 0x00000081
64: R_ARM_ABS32 .text_exec_only
68: 2300 movs r3, #0
6a: e00c b.n 86 <SecretSwitch+0x4e>
6c: 2301 movs r3, #1
6e: e00a b.n 86 <SecretSwitch+0x4e>
70: 2302 movs r3, #2
72: e008 b.n 86 <SecretSwitch+0x4e>
74: 2303 movs r3, #3
76: e006 b.n 86 <SecretSwitch+0x4e>
78: 2304 movs r3, #4
7a: e004 b.n 86 <SecretSwitch+0x4e>
7c: 2305 movs r3, #5
7e: e002 b.n 86 <SecretSwitch+0x4e>
80: 2306 movs r3, #6
82: e000 b.n 86 <SecretSwitch+0x4e>
84: 687b ldr r3, [r7, #4]
86: 4618 mov r0, r3
88: 370c adds r7, #12
8a: 46bd mov sp, r7
8c: f85d 7b04 ldr.w r7, [sp], #4
90: 4770 bx lr
92: bf00 nop


What is marked in green above the assembly listing is a jump table — a table with data/offsets in the code. Translating the switch statement, the compiler has decided to generate a table with jump offsets, and the code marked in red is loading the constant data with a PC relative instruction. Here, again, the executed code of this function is reading from its own code memory.

Veneer Functions

Another case where the code might use data in the code memory are trampoline or veneer functions. The limited opcode length of the ARM assembly code does not allow to jump to anyware in the 32bit address space.

For example, the bl (branch and link) assembly instruction uses a 24bit immediate (in word units) for encoding the branch offset from the current PC location. The offset is resolved by the linker in the link phase.

Consider the following case where our secret code calls a function that is far away:

int SecretFarJump(int i) {
  return FarFunction(i);
}


The assembly code for this is the following:

Veneer

This veneer function jumps to the destination address using the 32-bit address directly placed after the:

ldr.w pc, [pc]


Instruction loads the program counter with that target address using the pc-relative-indirect addressing mode. Here, again, the code using data access to the code area will not be possible if that code runs in execute-only memory.

The linker will patch the ‘bl’ to jump to that veneer function:

Veneer Function

Pure Code

What is required for code to be placed into execute is to have pure code. Pure code has no data access at all. For this, the ARM GCC implements the following special commandline option from  https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html:

-mpure-code does not allow constant data to be placed in code sections. Additionally, when compiling for ELF object format, give all text sections the ELF processor-specific section attribute SHF_ARM_PURECODE. This option is only available when generating non-pic code for M-profile targets with the MOVT instruction.

Note: there is a similar option:

-mslow-flash-data assumes loading data from flash and is slower than fetching instruction. Therefore, literal load is minimized for better performance. This option is only supported when compiling for ARMv7 M-profile and off by default.

Similar as -mpure-code, the -mslow-flash-data option avoids data access in the code, but not 100 percent. The result of this option is that it can improve performance, especially if the flash memory is slower than the instruction fetches for code execution. But, with the -mslow-flash-data , data fetches in the code could exist. But, this might be a good optimization option.

The pure code feature is implemented from GNU ARM Embedded Toolchain 6 2016q4 release, see https://launchpad.net/gcc-arm-embedded/+announcements?memo=5&start=5. KDS V3.2 is using an older version, so you would have to upgrade the compiler. See Switching ARM GNU Tool Chain and Libraries in Kinetis Design Studio.

I can add that -mpure-code option to the files that I want to put into execute-only memory. This can be accomplished in Eclipse/CDT with adding the option to the file settings:

-mpure-code for a source file

 -mpure-code  for a source file

With  -mpure-code:

int SecretFunction(int i) {
  return i+0x1234567;
}


This does not use any constant loads in the code. Instead, it uses movw and movt instructions:

00000000 <SecretFunction>:
0: b480 push {r7}
2: b083 sub sp, #12
4: af00 add r7, sp, #0
6: 6078 str r0, [r7, #4]
8: 687a ldr r2, [r7, #4]
a: f244 5367 movw r3, #17767 ; 0x4567
e: f2c0 1323 movt r3, #291 ; 0x123
12: 4413 add r3, r2
14: 4618 mov r0, r3
16: 370c adds r7, #12
18: 46bd mov sp, r7
1a: f85d 7b04 ldr.w r7, [sp], #4
1e: 4770 bx lr


This is the same for the jump table previously generated for the  switch() :

int SecretSwitch(int i) {
  switch(i) {
    case 0: return 0;
    case 1: return 1;
    case 2: return 2;
    case 3: return 3;
    case 4: return 4;
    case 5: return 5;
    case 6: return 6;
    default: return i;
  }
}


Now, it generates:

00000038 <SecretSwitch>:
38: b480 push {r7}
3a: b083 sub sp, #12
3c: af00 add r7, sp, #0
3e: 6078 str r0, [r7, #4]
40: 687b ldr r3, [r7, #4]
42: 2b03 cmp r3, #3
44: d015 beq.n 72 <SecretSwitch+0x3a>
46: 2b03 cmp r3, #3
48: dc06 bgt.n 58 <SecretSwitch+0x20>
4a: 2b01 cmp r3, #1
4c: d00d beq.n 6a <SecretSwitch+0x32>
4e: 2b01 cmp r3, #1
50: dc0d bgt.n 6e <SecretSwitch+0x36>
52: 2b00 cmp r3, #0
54: d007 beq.n 66 <SecretSwitch+0x2e>
56: e014 b.n 82 <SecretSwitch+0x4a>
58: 2b05 cmp r3, #5
5a: d00e beq.n 7a <SecretSwitch+0x42>
5c: 2b05 cmp r3, #5
5e: db0a blt.n 76 <SecretSwitch+0x3e>
60: 2b06 cmp r3, #6
62: d00c beq.n 7e <SecretSwitch+0x46>
64: e00d b.n 82 <SecretSwitch+0x4a>
66: 2300 movs r3, #0
68: e00c b.n 84 <SecretSwitch+0x4c>
6a: 2301 movs r3, #1
6c: e00a b.n 84 <SecretSwitch+0x4c>
6e: 2302 movs r3, #2
70: e008 b.n 84 <SecretSwitch+0x4c>
72: 2303 movs r3, #3
74: e006 b.n 84 <SecretSwitch+0x4c>
76: 2304 movs r3, #4
78: e004 b.n 84 <SecretSwitch+0x4c>
7a: 2305 movs r3, #5
7c: e002 b.n 84 <SecretSwitch+0x4c>
7e: 2306 movs r3, #6
80: e000 b.n 84 <SecretSwitch+0x4c>
82: 687b ldr r3, [r7, #4]
84: 4618 mov r0, r3
86: 370c adds r7, #12
88: 46bd mov sp, r7
8a: f85d 7b04 ldr.w r7, [sp], #4
8e: 4770 bx lr


This does not access any data inside the code. Looking at the veneer function, this one is now "pure":

pure veneer function

How to Put Code Into Execute-Only Memory

What remains is how to get the execute code into execute-only memory. First, I need something like this for the:

MEMORY
{
  /* Define each memory region */
  PROGRAM_FLASH (rx) : ORIGIN = 0x0, LENGTH = 0x80000 /* 512K bytes (alias Flash) */ 
  EXECUTE_ONLY (x) : ORIGIN = 0x80000, LENGTH = 0x80000 /* 512K bytes (alias Flash2) */ 
  FAR_FLASH (rx) : ORIGIN = 0xa0100000, LENGTH = 0x400 /* 1K bytes (alias Flash3) */ 
  SRAM_UPPER (rwx) : ORIGIN = 0x20000000, LENGTH = 0x30000 /* 192K bytes (alias RAM) */ 
  SRAM_LOWER (rwx) : ORIGIN = 0x1fff0000, LENGTH = 0x10000 /* 64K bytes (alias RAM2) */ 
}


For this, I can use  __attribute__  to mark a function:

int __attribute__((section (".text_EXECUTE_ONLY"))) mySecretCode(int i) {
  /* code */
}  


Because the linker script, I have something like this to place things into the EXECUTE_ONLY   section:

SECTIONS
{
.text_Flash2 : ALIGN(8)
{
FILL(0xff)
*(.text_Flash2*) /* for compatibility with previous releases */
*(.text_EXECUTE_ONLY*) /* for compatibility with previous releases */
*(.text.$Flash2*)
*(.text.$EXECUTE_ONLY*)
*(.rodata.$Flash2*)
*(.rodata.$EXECUTE_ONLY*)
} > EXECUTE_ONLY
...


An easier way might be to simply do this on a file base. Say, if I have all my execute-only code in a file named ExecuteOnly.c  (producing the object file ExecuteOnly.o), then I can use this:

 .text_Flash2 : ALIGN(8)
{
  FILL(0xff)
  *ExecuteOnly.o (.text .text*)
  *(.text_Flash2*) /* for compatibility with previous releases */
  *(.text_EXECUTE_ONLY*) /* for compatibility with previous releases */
  *(.text.$Flash2*)
  *(.text.$EXECUTE_ONLY*)
  *(.rodata.$Flash2*)
  *(.rodata.$EXECUTE_ONLY*)
} > EXECUTE_ONLY


This places all the .text*  from ExecuteOnly.o into my special section (see “Putting Code of Files into Special Section with the GNU Linker“). If using the MCUXpresso IDE, which has a nice managed linker script feature, I add the follwing to the Extra linker script input section:

Extra Managed Linker Script Section

The question is: what happens with the any veneer functions? The release note text above talks about the SHF_ARM_PURECODE attribute. What I see is that the veneer function gets the name  .text_EXECUTE_ONLY.__stub . This way, it gets placed into the execute-only section too, because I used * (.text_EXECUTE_ONLY*)  in the linker script.

Veneer Function allocation

Summary

Execute-only memory is something that gets implemented in more and more devices and applications that are concerned with code security. It might not be the 100 percent perfect secure solution for everyone, but — to me — it looks like a good idea to put walls around the firmware to prevent reverse engineering. But, it requires understanding how the compiler is generating code, and how to configure the compiler and linker for execute-only-memory.

Happy Executing!

Topics:
iot ,execute-only code ,execute-only ,memory ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}