Over a million developers have joined DZone.

Adafruit WS2812B NeoPixels with the Freescale FRDM-K64F Board – Part 5: DMA

In this post we'll see how to trigger memory to memory events.

· IoT Zone

This is Part 5 of a Mini Series. In Part 4, I described how to set up the FTM (Kinetis Flex Timer Module) to generate the required waveforms used for DMA operations (see “Tutorial: Adafruit WS2812B NeoPixels with the Freescale FRDM-K64F Board – Part 4: Timer“). In this post I describe how to use to trigger DMA (Direct To Memory) events. The goal is to drive Adafruit’s NeoPixel (WS2812B) with the Freescale FRDM-K64F board:

FRDM-K64F with Adafruit NeoPixel

FRDM-K64F with Adafruit NeoPixel


Mini Series Tutorial List

  1. Tutorial: Adafruit WS2812B NeoPixels with the Freescale FRDM-K64F Board – Part 1: Hardware
  2. Tutorial: Adafruit WS2812B NeoPixels with the Freescale FRDM-K64F Board – Part 2: Software Tools
  3. Tutorial: Adafruit WS2812B NeoPixels with the Freescale FRDM-K64F Board – Part 3: Concepts
  4. Tutorial: Adafruit WS2812B NeoPixels with the Freescale FRDM-K64F Board – Part 4: Timer
  5. Tutorial: Adafruit WS2812B NeoPixels with the Freescale FRDM-K64F Board – Part 5: DMA

Outline

In this article I use DMA (Direct Memory Access) to do memory to memory operations to generate the required bit stream for the WS2812B LEDs. In the previous tutorial I have used the FTM of the FRDM-K64F device to generate three signals:

Waveforms and Timing

Waveforms and Timing


I will use the ‘falling edge’ of the signals to trigger DMA transfers, marked as ‘M’ in the following timing diagram:

Driving Bits with DMA

Driving Bits with DMA


In this post I’m using Kinetis Design Studio v3.0.0 with the Kinetis SDK v1.2.

We will setup this whole engine later in this article. First let’s to the easy thing: configure the GPIO pin to the DIN of the LEDs.

GPIO Port

To generate the signal to DIN of the NeoPixel/WS2812, I can use a normal GPIO (General Purpose Input/Output) pin. If I use multiple pins on such a GPIO port, I can drive multiple ‘lanes’ of pixel arrays.

:idea: I need 24 bits to each LED/pixel (8bits for red, green and blue each). Due the nature of writing bytes to the GPIO Port, I need 3 bytes of memory (usually RAM) for each LED. So having a lot of LED’s means a lot of RAM. With just one lane, only one bit in each byte is used. But if I have 8 lanes (say port bits 0 to 7), then I can still need 3 bytes for each pixel, but I can drive 8 LEDs with these three bytes. So if you have many, many LED’s, use multiple lanes to combine them. This not only reduces the amount of memory needed, it reduces as well the time needed to send the bit stream.

To use the GPIO port, I need to:

  1. Mux the Pin to the port used. Basically this means to route the port internal signal to the external pin.
  2. Clock the port (enable the clock). Accessing the port registers without having it clocked will case a hard fault.
  3. Configure the port/pin as output pin/port using the GPIOx_PDDR (Port Data Direction Register).
  4. To put the pin(s) HIGH, I can write a 1 bit/value to the GPIOx_PSOR (Port Set Output Register)
  5. To put the pin(s) LOW, I can write a 1 bit/value to the GPIOx_PCOR (Port Clear Output Register)
  6. To put the pin(s) either HIGH or LOW, I can write the bit/value into the GPIOx_PDOR (Port Data Output Register).

The following diagram shows the necessary port output register writes to create the WS2812 bit stream:

GPIO Ouput Register Writes

GPIO Output Register Writes


We could do this from the timer interrupts, but again this would be too slow. Instead, these port output register writes shall be triggered by DMA.

Configure the GPIO Port

On my board, I’m only using one lane/pin to the DIN of the WS2812B. I’m going to usePTD0 (PORT D, pin 0) for it:

Using PTD0 to DIN

Using PTD0 to DIN


The other three white wires are the pins of the three FTM channels connected to the logic analyzer.

So I need to extend my hardware initialization as below:

  1. Line 4: enable clock gate for port D
  2. Line 11: Mux PTD0 as GPIO
  3. Line 12: Write the PDDR (Port Data Direction Register) with a 1 bit to use PTD0 as output pin.


static void InitHardware(void) {
  /* Enable clock for PORTs */
  SIM_HAL_EnableClock(SIM, kSimClockGatePortC);
  SIM_HAL_EnableClock(SIM, kSimClockGatePortD);

  /* Setup board clock source. */
  g_xtal0ClkFreq = 50000000U;           /* Value of the external crystal or oscillator clock frequency of the system oscillator (OSC) in Hz */
  g_xtalRtcClkFreq = 32768U;            /* Value of the external 32k crystal or oscillator clock frequency of the RTC in Hz */

  /* Use PTD0 as DIN to the Neopixels: mux it as GPIO and output pin */
  PORT_HAL_SetMuxMode(PORTD, 0UL, kPortMuxAsGpio); /* PTD0: DIN to NeoPixels */
  GPIO_PDDR_REG(PTD_BASE_PTR) |= (1<<0); /* PTD0 as output */

  /* FTM and FTM Muxing */
  InitFlexTimer(FTM0_IDX);
  PORT_HAL_SetMuxMode(PORTC,1UL,kPortMuxAlt4); /* use PTC1 for channel 0 of FTM0 */
  PORT_HAL_SetMuxMode(PORTC,2UL,kPortMuxAlt4); /* use PTC2 for channel 1 of FTM0 */
  PORT_HAL_SetMuxMode(PORTC,3UL,kPortMuxAlt4); /* use PTC3 for channel 2 of FTM0 */
}



You might notice that I’m using different APIs to do this.

PORT_HAL_SetMuxMode(PORTD, 0UL, kPortMuxAsGpio); /* PTD0: DIN to NeoPixels */


is a method of the Kinetis SDK. However

GPIO_PDDR_REG(PTD_BASE_PTR) |= (1<<0); /* PTD0 as output */

is using CMSIS-Core style direct register write. The Muxing is straight forward. However to set up a pin as output pin requires additional layers in the SDK with pin descriptors. To me, using the Kinetis SDK GPIO layers is overly complex in this example, so I simply use CMSIS register macros.

:idea: I want to show here as well that mix-and-match of SDK with CMSIS is my view a good thing to balance ease-of-use and complexity.

With this, I have my GPIO pin configured. Now I need to write the port registers with DMA.

Direct Memory Access

As explained in the Concepts post, I need something very fast to write a GPIO port register. As the timing is around 0.3 μs, definitely too fast to use the CPU for this, especially if I want the CPU to do something else too. With DMA, the access to memory will be done without the CPU involvement, exactly what I need.

I’m using DMA on the FRDM-KL25Z board for things like reading ports in a DIY Logic Analyzer, or driving WS2812 pixels. The ARM Cortex-M4F microcontroller on the FRDM-K64F board has an eDMA (enhanced DMA) controller on it. It can use up to 16 independent DMA channels for DMA operations, with advanced source/and destination address calculations. That eDMA controller is described in the K64F Reference Manual.

eDMA Block Diagram

eDMA Block Diagram (Source: Freescale K64F Reference Manual)


  • Data Path: the controller can read/write data from/to the crossbar switch. The crossbar provides access to memory and peripherals.
  • Address Path: This block is calculating the source and destination address. It does the calculation, plus any incrementing or decrementing of the address. For this it uses Transfer Control Descriptors (TCD).
  • Control and Channel Arbitration: This block is responsible to receive DMA requests from the supported request sources (e.g. from the timer module) and the write back flags to it (like telling the timer module that the DMA operation is done).
  • Transfer Control Descriptor: The descriptor is used to describe what shall be done in the DMA operations: how many bytes to read/write, source and destination address, what to do after the transfer, how many loops (inner and outer loops).

The basic DMA flow is the following: When a DMA peripheral request comes in, it will set the source and destination address using the TCD:

eDMA Operation, Part 1

eDMA Operation, Part 1 (Source: Freescale K64F Reference Manual)


Using the source and destination address, the controller will do the read/write operation. Depending on the configuration in the TCD, this can be multiple source/destination read/writes with ‘minor’ and ‘major’ loop counters:

eDMA Operation, Part 2

eDMA Operation, Part 2 (Source: Freescale K64F Reference Manual)


In the last step, the TCD is updated, e.g. address values are changed and flags get set. Additionally the peripheral who requested the DMA transfer gets informed that the operation is done:

eDMA operation, Part 3

eDMA operation, Part 3


Memory Considerations

Remember, I have three FTM channels. Each channel shall do trigger a GPIO Port operation:

  1. FTM0 Channel 0: Write ‘1’ to PSOR to set DIN to HIGH.
  2. FTM0 Channel 1: Write data bit to PDOR to either keep DIN HIGH (‘1’ WS2812 bit) or to put DIN LOW (‘0’ WS2812 bit).
  3. FTM0 Channel 2: Write ‘1’ to PCOR to set DIN to LOW.

This needs to be done for each WS2812 bit, and the number of bits is given by the number of WS2812 LEDs (24 bits for each), and the bits are stored in a buffer:

#define NEO_NOF_PIXEL       (8*8) /* Adafruit 8x8 matrix */
#define NEO_NOF_BITS_PIXEL   (24) /* 24 bits for pixel */
static uint8_t transmitBuf[NEO_NOF_PIXEL*NEO_NOF_BITS_PIXEL];

Remember, that only the least-significant-bit is used in each byte, as I’m only using a single lane of WS2812.

:idea: If I would use 8 lanes (e.g. 8 NeoPixel Matrix displays, each connected to a single port pin, PTD0 to PTD7) then I would use every bit of the byte. I need 3 bytes of memory for each WS2812 pixel.

Triggering DMA Requests

To enable DMA requests from my FTM channels, I need to carefully read the reference manual:

FTM DMA Request

FTM DMA Request


What is confusing to me is that two settings (DMA=0|CHnIE=0 and DMA=1|CHnIE=0) are doing the same? First I thought that this must be a copy-paste error in the manual. But without enabling the ‘Interrupt Enable’ (CHnIE) bit the DMA was not working :-(. So it seems that really both bits have to set. And this was what I had to do in my FTM initialization/reset routine:

static void ResetFTM(uint32_t instance) {
  FTM_Type *ftmBase = g_ftmBase[instance];
  uint8_t channel;

  /* reset all values */
  FTM_HAL_SetCounter(ftmBase, 0); /* reset FTM counter */
  FTM_HAL_ClearTimerOverflow(ftmBase); /* clear timer overflow flag (if any) */
  for(channel=0; channel&amp;lt;NOF_FTM_CHANNELS; channel++) {
    FTM_HAL_ClearChnEventFlag(ftmBase, channel); /* clear channel flag */
    FTM_HAL_SetChnDmaCmd(ftmBase, channel, true); /* enable DMA request */
    FTM_HAL_EnableChnInt(ftmBase, channel); /* enable channel interrupt: need to have both DMA and CHnIE set for DMA transfers! See RM 40.4.23 */
  }
}


DMA Driver Initialization

Time to initialize the DMA driver of the SDK. Because of the complexity of eDMA, I’m using again a mixture of Kinetis SDK API and Kinetis SDK HAL API. The initialization of the DMA I do with the SDK API:

static void InitDMADriver(void) {
  edma_user_config_t  edmaUserConfig;
  static edma_state_t edmaState;
  uint8_t res, channel;

  /* Initialize eDMA modules. */
  edmaUserConfig.chnArbitration = kEDMAChnArbitrationRoundrobin; /* use round-robin arbitration */
  edmaUserConfig.notHaltOnError = false; /* do not halt in case of errors */
  EDMA_DRV_Init(&amp;amp;edmaState, &amp;amp;edmaUserConfig); /* initialize DMA with configuration */
}


The initialization is rather simple: I set the DMA channel arbitration (priority scheduling) to Round-Robin. This means that the DMA will execute one channel after each other, and not use the DMA channel priority mechanism. As I have a fixed sequence of timer channel events, I keep it simple and use round-robin. With noHaltOnError I specify that the device should not halt in case of errors, this is again to keep things simple.

I initialize the DMA Driver as part of my hardware initialization:

static void InitHardware(void) {
  /* Enable clock for PORTs */
  SIM_HAL_EnableClock(SIM, kSimClockGatePortC);
  SIM_HAL_EnableClock(SIM, kSimClockGatePortD);

  /* Setup board clock source. */
  g_xtal0ClkFreq = 50000000U;           /* Value of the external crystal or oscillator clock frequency of the system oscillator (OSC) in Hz */
  g_xtalRtcClkFreq = 32768U;            /* Value of the external 32k crystal or oscillator clock frequency of the RTC in Hz */

  /* Use PTD0 as DIN to the Neopixels: mux it as GPIO and output pin */
  PORT_HAL_SetMuxMode(PORTD, 0UL, kPortMuxAsGpio); /* PTD0: DIN to NeoPixels */
  GPIO_PDDR_REG(PTD_BASE_PTR) |= (1&amp;lt;&amp;lt;0); /* PTD0 as output */

  /* FTM and FTM Muxing */
  InitFlexTimer(FTM0_IDX);
  PORT_HAL_SetMuxMode(PORTC,1UL,kPortMuxAlt4); /* use PTC1 for channel 0 of FTM0 */
  PORT_HAL_SetMuxMode(PORTC,2UL,kPortMuxAlt4); /* use PTC2 for channel 1 of FTM0 */
  PORT_HAL_SetMuxMode(PORTC,3UL,kPortMuxAlt4); /* use PTC3 for channel 2 of FTM0 */

  InitDMADriver(); /* initialize DMA driver */
}


Transfer the Bits the DMA

So far I have everything set up:

  • FTM timer is generating the needed signals, with DMA triggering enabled
  • GPIO for the DIN to the LED is ready
  • eDMA driver is initialized

Now I can start a DMA transfer, and I use the following method:

void DMA_Transfer(uint8_t *transmitBuf, uint32_t nofBytes);


Remember, that I have a buffer with the bits for the WS2812 LEDs. In order to send the bits to the PTD0, I can use

DMA_Transfer(transmitBuf, sizeof(transmitBuf));


DMA Transfer

I’m going to use three DMA channels, one for each timer channel. In order to transmit the bits with DMA in DMA_Transfer(), I do the following:

  1. Reset FTM: reset the timer registers. The FTM is not clocked at this point.
  2. DMA Muxing: Request three DMA channels for FTM0 channel 1, 2 and 3
  3. Install callback: install an ‘End of Transfer’ interrupt handler for DMA channel 3. That way I get notified when the transfer of all bits is over.
  4. Setup the DMA TCD:  Setting up the Transfer Control Descriptor with source/destination for the DMA channel.
  5. Start/Enable all DMA channels: this turns on/enables the DMA channels.
  6. Start the FTM: initialize a ‘dmaDone’ flag and turning on the clocks to the FTM, letting the timer run.
  7. Wait until DMA is done: the ‘end of transfer interrupt’ will set the ‘dmaDone’ flag.
  8. Turn off FTM: remove the clock from the FTM timer.
  9. Disable/stop all DMA channels.
  10. De-Mux and de-install DMA channels.

:idea: You might wonder why I’m doing the Muxing and De-Muxing for every transfer (step 2 and 10)? The answer is (I believe) that the there are internal propagation delays inside the DMA controller. Muxing and De-Muxing the DMA ensures that the DMA controller is resetting its internal registers. I had to learn this the hard way: DMA worked fine at lower speed (say 1 ms DMA frequencies), as there was enough time and clocking inside the module to get it into the correct state. But using the DMA in the sub μs time domain as I’m using it here definitely showed some strange DMA behaviour with ‘ghost’ DMA transfers. I already had these strange things happening on the FRDM-KL25Z, see “NeoShield: WS2812 RGB LED Shield with DMA and nRF24L01+“.

The following is the full routine, I will discuss some of the details

/* DMA related */
#define NOF_EDMA_CHANNELS  3 /* using three DMA channels */
static edma_chn_state_t chnStates[NOF_EDMA_CHANNELS]; /* array of DMA channel states */
static volatile bool dmaDone = false; /* set by DMA complete interrupt on DMA channel 3 */
static const uint8_t OneValue = 0xFF; /* value to clear or set the port bits */

void DMA_Transfer(uint8_t *transmitBuf, uint32_t nofBytes) {
  edma_transfer_config_t config;
  uint8_t channel;
  uint8_t res;

  ResetFTM(FTM0_IDX); /* clear FTFM and prepare for DMA */

  /* DMA Muxing: Allocate EDMA channel request trough DMAMUX */
  for (channel=0; channel&amp;lt;NOF_EDMA_CHANNELS; channel++) {
    res = EDMA_DRV_RequestChannel(channel, kDmaRequestMux0FTM0Channel0+channel, &amp;amp;chnStates[channel]);
    if (res==kEDMAInvalidChannel) { /* check error code */
      for(;;); /* ups!?! */
    }
  }
  /* Install callback for eDMA handler on last channel which is channel 2 */
  EDMA_DRV_InstallCallback(&amp;amp;chnStates[NOF_EDMA_CHANNELS-1], EDMA_Callback, NULL);

  /* prepare DMA configuration */
  config.srcLastAddrAdjust = 0; /* no address adjustment needed after last transfer */
  config.destLastAddrAdjust = 0; /* no address adjustment needed after last transfer */
  config.srcModulo = kEDMAModuloDisable; /* no address modulo (no ring buffer) */
  config.destModulo = kEDMAModuloDisable; /* no address modulo (no ring buffer) */
  config.srcTransferSize = kEDMATransferSize_1Bytes; /* transmitting one byte in each DMA transfer */
  config.destTransferSize = kEDMATransferSize_1Bytes; /* transmitting one byte in each DMA transfer */
  config.minorLoopCount = 1; /* one byte transmitted for each request */
  config.majorLoopCount = nofBytes; /* total number of bytes to send */
  config.destOffset = 0; /* do not increment destination address */

  config.srcAddr = (uint32_t)&amp;amp;OneValue; /* Bit set */
  config.destAddr = (uint32_t)&amp;amp;GPIO_PSOR_REG(PTD_BASE_PTR); /* Port Set Output register */
  config.srcOffset = 0; /* do not increment source address */
  PushDMADescriptor(&amp;amp;config, &amp;amp;chnStates[0], false); /* write configuration to DMA channel 0 */

  config.srcAddr = (uint32_t)transmitBuf; /* pointer to data */
  config.destAddr = (uint32_t)&amp;amp;GPIO_PDOR_REG(PTD_BASE_PTR); /* Port Data Output register */
  config.srcOffset = 1; /* do not increment source address */
  PushDMADescriptor(&amp;amp;config, &amp;amp;chnStates[1], false); /* write configuration to DMA channel 1 */

  config.srcAddr = (uint32_t)&amp;amp;OneValue; /* Bit set */
  config.destAddr = (uint32_t)&amp;amp;GPIO_PCOR_REG(PTD_BASE_PTR); /* Port Clear Output register */
  config.srcOffset = 0; /* do not increment source address */
  PushDMADescriptor(&amp;amp;config, &amp;amp;chnStates[2], true); /* write configuration to DMA channel 1 */

  /* enable the DMA channels */
  for (channel=0; channel&amp;lt;NOF_EDMA_CHANNELS; channel++) {
    EDMA_DRV_StartChannel(&amp;amp;chnStates[channel]); /* enable DMA */
  }
  dmaDone = false; /* reset done flag */
  StartStopFTM(FTM0_IDX, true); /* start FTM timer to fire sequence of DMA transfers */
  do {
    /* wait until transfer is complete */
  } while(!dmaDone);
  StopFTMDMA(FTM0_IDX); /* stop FTM DMA tranfers */
  for (channel=0; channel&amp;lt;NOF_EDMA_CHANNELS; channel++) {
    EDMA_DRV_StopChannel(&amp;amp;chnStates[channel]); /* stop DMA channel */
  }
  /* Release EDMA channel request trough DMAMUX, otherwise events might still be latched! */
  for (channel=0; channel&amp;lt;NOF_EDMA_CHANNELS; channel++) {
    res = EDMA_DRV_ReleaseChannel(&amp;amp;chnStates[channel]);
    if (res!=kStatus_EDMA_Success) { /* check error code */
      for(;;); /* ups!?! */
    }
  }
}



One important part is the configuration of the TCD (Transfer Control Descriptor). I setup three descriptors, one for each DMA channel:

  1. Channel 0: Writing a ‘1’ to the PSOR (Port Set Output) register.
  2. Channel 1: Writing the data bit to the PDOR (Port Data Output) register.
  3. Channel 2: Writing a ‘1’ to the PCOR (Port Clear Output) register.

The descriptors have several fields to configure the DMA transfer. Basically what I describe for the DMA transfers is “take this byte from this source address and write it to this destination address”. In addition I specify “how many bytes to read/write” and if some address calculations shall be performed for the source and destination address. In the next sections I explain the different settings:

In the eDMA it is possible to make a special adjustment at the end of the last transfer: as I do not need this for the WS2812, that setting is an offset of zero:


config.srcLastAddrAdjust = 0; /* no address adjustment needed after last transfer */
config.destLastAddrAdjust = 0; /* no address adjustment needed after last transfer */


The DMA address calculation can be configured to ‘wrap-around’ e.g. if using a ring buffer: I have it disabled as I do not need that functionality:

config.srcModulo = kEDMAModuloDisable; /* no address modulo (no ring buffer) */
config.destModulo = kEDMAModuloDisable; /* no address modulo (no ring buffer) */




The next setting is to specify how many bytes have to be transmitted in a single DMA transfer: I only need to write a single byte to the GPIO port:

config.srcTransferSize = kEDMATransferSize_1Bytes; /* transmitting one byte in each DMA transfer */
config.destTransferSize = kEDMATransferSize_1Bytes; /* transmitting one byte in each DMA transfer */



In the next setting I can specify the ‘minor’ and ‘major’ loop: that way I can ‘nest’ the DMA operations:

eDMA Multiple Loop Interation

eDMA Multiple Loop Interation


In my case I only need to write a single byte for each DMA request, so the minor loop counter is ‘1’. However, I need to write multiple bytes for the DMA operation (to write all bytes of the transmitBuf[], therefore the majorLoopCount is the total number of bytes:

  config.minorLoopCount = 1; /* one byte transmitted for each request */
  config.majorLoopCount = nofBytes; /* total number of bytes to send */



The next setting is to specify what should happen with the destination address. The destination address will be the GPIO port address, so no need to change this.

  config.destOffset = 0; /* do not increment destination address */



The above settings are all the same for all three DMA channels. What follows are the special settings to be used for each DMA channel.

DNA channel zero will create a raising edge of the DIN WS2812 signal. To be executed by the CPU, I would write it like this:

static const uint8_t OneValue = 0x01; /* value to clear or set the port bits */

GPIO_PSOR_REG(PTD_BASE_PTR) = OneValue:


Translated to the DMA descriptor it is this:

  config.srcAddr = (uint32_t)&amp;amp;OneValue; /* Bit set */
  config.destAddr = (uint32_t)&amp;amp;GPIO_PSOR_REG(PTD_BASE_PTR); /* Port Set Output register */
  config.srcOffset = 0; /* do not increment source address */



Next is DMA channel 1 which will write the data bit. In ‘normal’ code it would be this:


static const uint8_t OneValue = 0x01; /* value to clear or set the port bits */

GPIO_PDOR_REG(PTD_BASE_PTR) = *transmitBuf; transmitBuf++;


In ‘DMA language’ it is this:

  config.srcAddr = (uint32_t)transmitBuf; /* pointer to data */
  config.destAddr = (uint32_t)&amp;amp;GPIO_PDOR_REG(PTD_BASE_PTR); /* Port Data Output register */
  config.srcOffset = 1; /* increment source address */



Lastly, like for DMA channel 0 the channel 2 writes a one to the GPIO register:

  config.srcAddr = (uint32_t)&amp;amp;OneValue; /* Bit set */
  config.destAddr = (uint32_t)&amp;amp;GPIO_PCOR_REG(PTD_BASE_PTR); /* Port Clear Output register */
  config.srcOffset = 0; /* do not increment source address */



Each of the Descriptors is written to the hardware registers with this custom routine:


Did you notice that temp[2] variable? This is necessary to align the TCD to a 32 byte boundary. If the address of the TCD is not aligned to that boundary, a hard fault will happen

static void PushDMADescriptor(edma_transfer_config_t *config, edma_chn_state_t *chn, bool enableInt) {
  /* If only one TCD is required, only hardware TCD is required and user
   * is not required to prepare the software TCD memory. */
  edma_software_tcd_t temp[2]; /* make it larger so we can have a 32byte aligned address into it */
  edma_software_tcd_t *tempTCD = STCD_ADDR(temp); /* ensure that we have a 32byte aligned address */

  memset((void*) tempTCD, 0, sizeof(edma_software_tcd_t)); /* initialize temporary descriptor with zeros */
  EDMA_DRV_PrepareDescriptorTransfer(chn, tempTCD, config, enableInt, true); /* prepare and copy descriptor into temporary one */
  EDMA_DRV_PushDescriptorToReg(chn, tempTCD); /* write EDMA registers */
}



:idea: WARNING: The EDMA_DRV_ConfigLoopTransfer() function in the Kinetis SDK v1.2 might create a hard fault, because it does not do that special alignment.

DMA channel 0 and 1 are configured not to create any interrupts. Only channel 2 is configured with the third parameter to raise an interrupt at the end of the ‘major’ iteration (when all bytes are transmitted):

  PushDMADescriptor(&amp;amp;config, &amp;amp;chnStates[2], true); /* write configuration to DMA channel 1, and enable 'end' interrupt for it */



So I have to add a handler for DMA interrupt on channel 2, otherwise my application will end up in an unhandled interrupt. DMA2_IRQHandler() is the interrupt handler, and EDMA_DRV_IRQHandler() will call the callback EDMA_Callback():

/*! @brief Dma channel 2 ISR */
void DMA2_IRQHandler(void){
   EDMA_DRV_IRQHandler(2U); /* call SDK EDMA IRQ handler, this will call EDMA_Callback() */
}

void EDMA_Callback(void *param, edma_chn_status_t chanStatus) {
  (void)param; /* not used */
  (void)chanStatus; /* not used */
  dmaDone = true; /* set 'done' flag at the end of the major loop */
}



That handler I have to install with

  /* Install callback for eDMA handler on last channel which is channel 2 */
  EDMA_DRV_InstallCallback(&amp;amp;chnStates[NOF_EDMA_CHANNELS-1], EDMA_Callback, NULL);




With all the TCD settings pushed to the DMA channels, it is time to enable all the channels:

  /* enable the DMA channels */
  for (channel=0; channel&amp;lt;NOF_EDMA_CHANNELS; channel++) {
    EDMA_DRV_StartChannel(&amp;amp;chnStates[channel]); /* enable DMA */
  }



Then I reset the ‘done’ flag, start the FTM timer and wait until the transfer is done:

  dmaDone = false; /* reset done flag */
  StartStopFTM(FTM0_IDX, true); /* start FTM timer to fire sequence of DMA transfers */
  do {
    /* wait until transfer is complete */
  } while(!dmaDone);




After all bytes are sent, I stop the FTM timer, disable the channels and release the DMA channels:

  StopFTMDMA(FTM0_IDX); /* stop FTM DMA transfers */
  for (channel=0; channel&amp;lt;NOF_EDMA_CHANNELS; channel++) {
    EDMA_DRV_StopChannel(&amp;amp;chnStates[channel]); /* stop DMA channel */
  }
  /* Release EDMA channel request trough DMAMUX, otherwise events might still be latched! */
  for (channel=0; channel&amp;lt;NOF_EDMA_CHANNELS; channel++) {
    res = EDMA_DRV_ReleaseChannel(&amp;amp;chnStates[channel]);
    if (res!=kStatus_EDMA_Success) { /* check error code */
      for(;;); /* ups!?! */
    }
  }




This completes the DMA transfer, and things can start over again with the next transfer.

“Wonderful and Colorful Things”

Time to try things out. The following program writes three WS2812B pixels: green, red and blue:


#include &amp;quot;fsl_device_registers.h&amp;quot;
#include &amp;quot;DMAPixel.h&amp;quot;

#define NEO_NOF_PIXEL       3
#define NEO_NOF_BITS_PIXEL 24
static uint8_t transmitBuf[NEO_NOF_PIXEL*NEO_NOF_BITS_PIXEL] =
    {
        /* pixel 0: */
        1, 1, 1, 1, 1, 1, 1, 1, /* green */
        0, 0, 0, 0, 0, 0, 0, 0, /* red */
        0, 0, 0, 0, 0, 0, 0, 0, /* blue */
        /* pixel 1: */
        0, 0, 0, 0, 0, 0, 0, 0, /* green */
        1, 1, 1, 1, 1, 1, 1, 1, /* red */
        0, 0, 0, 0, 0, 0, 0, 0,  /* blue */
        /* pixel 0: */
        0, 0, 0, 0, 0, 0, 0, 0, /* green */
        0, 0, 0, 0, 0, 0, 0, 0, /* red */
        1, 1, 1, 1, 1, 1, 1, 1  /* blue */
    };

int main(void) {
  uint8_t red, green, blue;

  DMA_Init();
  for (;;) {
    DMA_Transfer(transmitBuf, sizeof(transmitBuf));
  }
  /* Never leave main */
  return 0;
}


Checking with the logic analyzer I can see that it takes 91.1 μs to send the data:

Timing to transmit data for three WS2812

Timing to transmit data for three WS2812


The following zooms into the first 8 bits (green) sent:

first 8 green bits

first 8 green bits


I can see as well the delay between the timer/DMA event and the time until the port bit actually has changed: it is around 0.2 μs:

DMA to GPIO Delay

DMA to GPIO Delay


But the Timing for the ‘1’ and ‘0’ bits are within the specification :-):

WS2812 Bit 1 Timing

WS2812 Bit 1 Timing


WS2812 Bit 0 Timing

WS2812 Bit 0 Timing


And voilà, this is what I get on the NeoPixel Matrix: the first three LED’s in Green, Red and Blue :-):

Red, Green and Blue Color Pixels

Red, Green and Blue Color Pixels


Summary

I have now FTM with DMA working, and it bangs the bit out of the GPIO port, in one or multiple lanes. I’m using only one lane now, but it works the same way with multiple lanes. With the 128 KByte of RAM the number of WS2812 pixels I can drive now is huge: I need 24 bytes per pixel if I’m using a single lane. So for a 8×8 matrix I need 1536 bytes, but if I use eight 8×8 Boards with 8 lanes (PTD0 to PTD7), I only need that 3 bytes per pixel: 1536 bytes too :-)

:idea: I could pack all the 24 bits for pixel into three bytes and then make a multi-stage DMA transfer: unpack the bits and send it to the port. I have not thought that through, but maybe this would be something doable to reduce the amount of RAM needed for a single lane configuration.

This project uses DMA on a Freescale Kinetis device, and I tried my best to explain the approach used here. Still, there are a lot more features and possibilities with DMA. It takes some time to get familiar with DMA, but the capabilities are amazing :-).

I had to use a mixture of Freescale Kinetis SDK API, SDK HAL API and CMSIS register access macros. Freescale is promoting the Kinetis SDK, but this project again confirmed to me that the SDK alone does not cover all the needs of developing embedded applications: I still need CMSIS register access API. On the other side: there are some nice routines in the SDK and especially the HAL layer which makes things easier to use. But again as with everything: it takes time to learn all these things. And I hope that this article series can help you with that learning process.

The project sources are on GitHub here:
https://github.com/ErichStyger/mcuoneclipse/tree/master/Examples/KDS/FRDM-K64F120M/FRDM-K64F_NeoPixel_SDK

So, what could be next? I could describe/develop a ‘graphics’ driver for the WS2812 pixels? Or maybe that is something I leave to Manya? Post comments and let me know what you think :-).

Happy DMAing :-)


Topics:
adafruit ,iot

Published at DZone with permission of Erich Styger, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}