DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library
  1. DZone
  2. Software Design and Architecture
  3. Performance
  4. DPDK Cryptography Build and Tuning Guide
Content sponsored by Ampere Computing logo

DPDK Cryptography Build and Tuning Guide

This is a guide to building and tuning DPDK with ARMv8, OpenSSL, and IPSec crypto libraries on Ampere processors for optimal packet-processing performance.

By 
David Zeng user avatar
David Zeng
·
Nov. 19, 25 · Analysis
Likes (1)
Comment
Save
Tweet
Share
1.7K Views

One of the many use cases customers run on Ampere-powered systems is packet processing workloads built on DPDK. Ampere has published a setup and tuning guide for DPDK to assist customers with getting the best performance from these workloads. Since many customers make heavy use of encryption/decryption operations in their DPDK applications, we are supplementing the existing DPDK tuning guide with additional information on crypto library support and how to build DPDK with these crypto libraries.

Note: These steps should happen before building the DPDK library.

Summary of Poll Mode Drivers for Crypto on Ampere Processors

ARMv8 Crypto Driver

The ARMv8 crypto poll mode driver enables the use of crypto extensions to ARMv8 that optimize chained operations. The core functions of this driver are written in assembly. It is published by Arm at https://github.com/ARM-software/AArch64cryptolib.git.

ARMv8 Crypto PMD supports the following algorithm pairs:

Cipher algorithms:

  • RTE_CRYPTO_CIPHER_AES_CBC

Authentication algorithms:

  • RTE_CRYPTO_AUTH_SHA1_HMAC
  • RTE_CRYPTO_AUTH_SHA256_HMAC

Build DPDK With ARMv8 Crypto PMD

Download and build AArch64 crypto library source code (assumes current directory is /home/ampere/):

On the Ampere Altra family:

Shell
 
git clone https://github.com/ARM-software/AArch64cryptolib.git 
cd AArch64cryptolib 
make OPT=big EXTRA_CFLAGS="-march=armv8.2-a+crypto" 
sudo echo “/home/ampere/AArch64cryptolib” > /etc/ld.so.conf.d/armcrypto.conf 
sudo ldconfig 


On Ampere AmpereOne family:

Shell
 
git clone https://github.com/ARM-software/AArch64cryptolib.git 
cd AArch64cryptolib 
make OPT=biggereor3 EXTRA_CFLAGS="-march=armv8.6-a+crc+fp16+aes+sha3" 
sudo echo “/home/ampere/AArch64cryptolib” > /etc/ld.so.conf.d/armcrypto.conf 
sudo ldconfig 


Reference: https://doc.dpdk.org/guides/cryptodevs/armv8.html

OpenSSL Crypto Driver

For the best performance, use either OpenSSL 3.2 or 1.1.1 on the Ampere Altra family of processors, and OpenSSL 3.4.0 on the AmpereOne family of processors. Based on the results of our testing, these versions provide the best performance, and version 3.0. x and 3.1.x should be avoided due to significant performance regressions.

OpenSSL Crypto PMD supports the following algorithm pairs:

Cipher algorithms:

  • RTE_CRYPTO_CIPHER_3DES_CBC
  • RTE_CRYPTO_CIPHER_AES_CBC
  • RTE_CRYPTO_CIPHER_AES_CTR
  • RTE_CRYPTO_CIPHER_3DES_CTR
  • RTE_CRYPTO_CIPHER_DES_DOCSISBPI

Authentication algorithms:

  • RTE_CRYPTO_AUTH_AES_GMAC
  • RTE_CRYPTO_AUTH_MD5
  • RTE_CRYPTO_AUTH_SHA1
  • RTE_CRYPTO_AUTH_SHA224
  • RTE_CRYPTO_AUTH_SHA256
  • RTE_CRYPTO_AUTH_SHA384
  • RTE_CRYPTO_AUTH_SHA512
  • RTE_CRYPTO_AUTH_MD5_HMAC
  • RTE_CRYPTO_AUTH_SHA1_HMAC
  • RTE_CRYPTO_AUTH_SHA224_HMAC
  • RTE_CRYPTO_AUTH_SHA256_HMAC
  • RTE_CRYPTO_AUTH_SHA384_HMAC
  • RTE_CRYPTO_AUTH_SHA512_HMAC

AEAD algorithms:

  • RTE_CRYPTO_AEAD_AES_GCM
  • RTE_CRYPTO_AEAD_AES_CCM

Asymmetric crypto algorithms:

  • RTE_CRYPTO_ASYM_XFORM_RSA
  • RTE_CRYPTO_ASYM_XFORM_DSA
  • RTE_CRYPTO_ASYM_XFORM_DH
  • RTE_CRYPTO_ASYM_XFORM_MODINV
  • RTE_CRYPTO_ASYM_XFORM_MODEX

Download and Install OpenSSL 3.4.0

The OpenSSL libraries, along with each OS distribution, are quite different. That will affect performance across different OS distributions. To maintain consistent performance, please download and install OpenSSL 3.4.0.

On the Ampere Altra family:

Shell
 
wget https://github.com/openssl/openssl/archive/refs/tags/openssl-3.4.0.tar.gz 
tar zxf openssl-3.4.0.tar.gz 
cd openssl-openssl-3.4.0 
./Configure -mcpu=neoverse-n1 
make -j`nproc` 
sudo make -j `nproc` install 
sudo echo “/usr/local/lib” > /etc/ld.so.conf.d/openssl.conf 
sudo ldconfig 


On Ampere AmpereOne family:

Shell
 
wget https://github.com/openssl/openssl/archive/refs/tags/openssl-3.4.0.tar.gz 
tar zxf openssl-3.4.0.tar.gz 
cd openssl-openssl-3.4.0 
./Configure -mcpu=ampere1a 
make -j`nproc` 
sudo make -j `nproc` install 
sudo echo “/usr/local/lib” > /etc/ld.so.conf.d/openssl.conf 
sudo ldconfig 


Reference: https://doc.dpdk.org/guides/cryptodevs/openssl.html

IPSec Multi-Buffer Library for Aarch64

IPSec Multi-buffer library for Aarch64 supports the following algorithm pairs:

Cipher algorithms:

  • SNOW3G-UEA2
  • ZUC-EEA3
  • ZUC-EEA3-256

Authentication algorithms:

  • SNOW3G-UIA2
  • ZUC-EIA3
  • ZUC-EIA3-256

Download and build the ipsec-mb library:

Shell
 
git clone https://gitlab.arm.com/arm-reference-solutions/ipsec-mb 
cd ipsec-mb 
make 
make install PREFIX=/usr/local/ 


Reference: https://doc.dpdk.org/guides/cryptodevs/snow3g.html

Build DPDK With Crypto Support

On CentOS:

Shell
 
export LD_LIBRARY_PATH=/home/ampere/AArch64cryptolib:/usr/local/lib:/lib64 
export PKG_CONFIG_PATH=/home/ampere/AArch64cryptolib/pkgconfig:/usr/local/lib/pkgconfig:/lib64/pkgconfig 


On Ubuntu:

Shell
 
export LD_LIBRARY_PATH=/home/ampere/AArch64cryptolib:/usr/local/lib:/usr/local/lib/aarch64-linux-gnu:/lib/aarch64-linux-gnu 
export PKG_CONFIG_PATH=/home/ampere/AArch64cryptolib/pkgconfig:/usr/local/lib/pkgconfig:/usr/local/lib/aarch64-linux-gnu/pkgconfig:/lib/aarch64-linux-gnu/pkgconfig


Build DPDK

Shell
 
wget https://fast.dpdk.org/rel/dpdk-24.07.tar.gz 
tar zxf dpdk-24.07.tar.gz  
cd dpdk-24.07  
meson build 
ninja -C build 
ninja -C build install 


Check the config of the supported crypto device for armv8, openssl, ipsec_mb:

Shell
 
armv8, bcmfs, caam_jr, ccp, cnxk, dpaa_sec, dpaa2_sec, ipsec_mb,mlx5, nitrox, null, octeontx, openssl, scheduler, virtio


Crypto Performance Test on Ampere Altra Q80-30

The following performance test was performed on Ampere Altra Q80-30. The performance data will be different if a different SKU is used. Please refer to the later section “Tuning Guide” for hardware, BIOS, and OS settings before performance testing.

Test AES-CBC-128/SHA1-HMAC performance with single core using crypto_armv8 on Ampere Altra Q80-30:

Shell
 
sudo usertools/dpdk-hugepages.py --setup 10G 

cd build/app 

./dpdk-test-crypto-perf --socket-mem 2048,0 --legacy-mem --vdev crypto_armv8 -l 0,1 -n 8 -- --buffer-sz 64,128,256,512,1024,2048 --optype cipher-then-auth --ptest throughput --auth-key-sz 64 --cipher-key-sz 16 --devtype crypto_armv8 --cipher-iv-sz 16 --auth-op generate --burst-sz 32 --total-ops 10000000 --silent --digest-sz 12 --auth-algo sha1-hmac --cipher-algo aes-cbc --cipher-op encrypt 

    lcore id    Buf Size  Burst Size    Enqueued    Dequeued  Failed Enq  Failed Deq        MOps        Gbps  Cycles/Buf 

           1          64          32    10000000    10000000           0           0      6.4784      3.3169        3.86 

           1         128          32    10000000    10000000           0           0      4.6469      4.7584        5.38 

           1         256          32    10000000    10000000           0           0      2.9786      6.1002        8.39 

           1         512          32    10000000    10000000           0           0      1.7654      7.2312       14.16 

           1        1024          32    10000000    10000000           0           0      0.9730      7.9705       25.69 

           1        2048          32    10000000    10000000           0           0      0.5129      8.4039       48.74 


Test AES-CBC-128/SHA2-256-HMAC performance with single core using crypto_armv8:

Shell
 
cd build/app 

./dpdk-test-crypto-perf --socket-mem 2048,0 --legacy-mem --vdev crypto_armv8 -l 0,1 -n 8 -- --buffer-sz 64,128,256,512,1024,2048 --optype cipher-then-auth --ptest throughput --auth-key-sz 64 --cipher-key-sz 16 --devtype crypto_armv8 --cipher-iv-sz 16 --auth-op generate --burst-sz 32 --total-ops 10000000 --silent --digest-sz 12 --auth-algo sha2-256-hmac --cipher-algo aes-cbc --cipher-op encrypt 

    lcore id    Buf Size  Burst Size    Enqueued    Dequeued  Failed Enq  Failed Deq        MOps        Gbps  Cycles/Buf 

           1          64          32    10000000    10000000           0           0      6.7249      3.4432        3.72 

           1         128          32    10000000    10000000           0           0      4.8760      4.9930        5.13 

           1         256          32    10000000    10000000           0           0      3.0952      6.3389        8.08 

           1         512          32    10000000    10000000           0           0      1.8318      7.5031       13.65 

           1        1024          32    10000000    10000000           0           0      1.0093      8.2681       24.77 

           1        2048          32    10000000    10000000           0           0      0.5316      8.7102       47.03 


Test AES-GCM-128 performance with a single core using crypto_openssl:

Shell
 
cd build/app 

./dpdk-test-crypto-perf --socket-mem 2048,0 --legacy-mem --vdev crypto_openssl -l 0,1 -n 8 -- --aead-key-sz 16 --buffer-sz 64,128,256,512,1024,2048 --optype aead --ptest throughput --aead-aad-sz 16 --devtype crypto_openssl --aead-op encrypt --burst-sz 32 --total-ops 10000000 --silent --digest-sz 16 --aead-algo aes-gcm --aead-iv-sz 12 

    lcore id    Buf Size  Burst Size    Enqueued    Dequeued  Failed Enq  Failed Deq        MOps        Gbps  Cycles/Buf 

           1          64          32    10000000    10000000           0           0      5.0681      2.5949        4.93 

           1         128          32    10000000    10000000           0           0      4.5814      4.6914        5.46 

           1         256          32    10000000    10000000           0           0      3.6966      7.5706        6.76 

           1         512          32    10000000    10000000           0           0      2.7922     11.4367        8.95 

           1        1024          32    10000000    10000000           0           0      1.8881     15.4671       13.24 

           1        2048          32    10000000    10000000           0           0      1.1478     18.8056       21.78 


Test AES-CTR/AES-CMAC performance with a single core using crypto_openssl:

Shell
 
cd build/app 

./dpdk-test-crypto-perf --socket-mem 2048,0 --legacy-mem --vdev crypto_openssl -l 0,1 -n 8 -- --buffer-sz 64,128,256,512,1024,2048 --optype cipher-then-auth --ptest throughput --auth-key-sz 32 --cipher-key-sz 16 --devtype crypto_openssl --cipher-iv-sz 16 --auth-op generate --burst-sz 32 --total-ops 10000000 --digest-sz 12 --auth-algo aes-cmac --cipher-algo aes-ctr --cipher-op encrypt 

    lcore id    Buf Size  Burst Size    Enqueued    Dequeued  Failed Enq  Failed Deq        MOps        Gbps  Cycles/Buf 

           1          64          32    10000000    10000000           0           0      3.0675      1.5706        8.15 

           1         128          32    10000000    10000000           0           0      2.6728      2.7370        9.35 

           1         256          32    10000000    10000000           0           0      2.0764      4.2524       12.04 

           1         512          32    10000000    10000000           0           0      1.4550      5.9599       17.18 

           1        1024          32    10000000    10000000           0           0      0.8887      7.2800       28.13 

           1        2048          32    10000000    10000000           0           0      0.5055      8.2824       49.45


Test snow3g-uea2 cipher-only with single core using crypto_snow3g:

Shell
 
cd build/app 

./dpdk-test-crypto-perf --socket-mem 2048,0 --legacy-mem --vdev crypto_snow3g -l 0,1 -n 8 -- --devtype crypto_snow3g --ptest throughput --pool-sz 16384 --total-ops 10000000 --burst-sz 32 --optype cipher-only --cipher-algo snow3g-uea2 --cipher-iv-sz 16 --auth-op generate --cipher-key-sz 16 --buffer-sz 64,128,256,512,1024,2048 --cipher-op encrypt 

    lcore id    Buf Size  Burst Size    Enqueued    Dequeued  Failed Enq  Failed Deq        MOps        Gbps  Cycles/Buf 

           1          64          32    10000000    10000000           0           0      3.7096      1.8993        6.74 

           1         128          32    10000000    10000000           0           0      2.8556      2.9242        8.75 

           1         256          32    10000000    10000000           0           0      1.9718      4.0383       12.68 

           1         512          32    10000000    10000000           0           0      1.2173      4.9859       20.54 

           1        1024          32    10000000    10000000           0           0      0.6901      5.6535       36.23 

           1        2048          32    10000000    10000000           0           0      0.3693      6.0503       67.70  


Crypto Performance Test on AmpereOne A192-32X

The following performance test was performed on AmpereOne A192-32X. The performance data will be different if a different processor model is used.

Please refer to the section “Tuning Guide” for hardware, BIOS, and OS settings before performance testing.

Test AES-CBC-128/SHA1-HMAC performance with single core using crypto_armv8:

Shell
 
sudo usertools/dpdk-hugepages.py --setup 10G 

cd build/app 

./dpdk-test-crypto-perf --socket-mem 2048,0 --legacy-mem --vdev crypto_armv8 -l 0,1 -n 8 -- --buffer-sz 64,128,256,512,1024,2048 --optype cipher-then-auth --ptest throughput --auth-key-sz 64 --cipher-key-sz 16 --devtype crypto_armv8 --cipher-iv-sz 16 --auth-op generate --burst-sz 32 --total-ops 10000000 --silent --digest-sz 12 --auth-algo sha1-hmac --cipher-algo aes-cbc --cipher-op encrypt 

    lcore id    Buf Size  Burst Size    Enqueued    Dequeued  Failed Enq  Failed Deq        MOps        Gbps  Cycles/Buf 

           1          64          32    10000000    10000000           0           0      8.1328      4.1640      122.96 

           1         128          32    10000000    10000000           0           0      5.7694      5.9079      173.33 

           1         256          32    10000000    10000000           0           0      3.4485      7.0625      289.98 

           1         512          32    10000000    10000000           0           0      1.9994      8.1894      500.16 

           1        1024          32    10000000    10000000           0           0      1.0866      8.9013      920.31 

           1        2048          32    10000000    10000000           0           0      0.5679      9.3045     1760.87 


Test AES-GCM-128 performance with a single core using crypto_openssl:

Shell
 
cd build/app 

./dpdk-test-crypto-perf --socket-mem 2048,0 --legacy-mem --vdev crypto_openssl -l 0,1 -n 8 -- --aead-key-sz 16 --buffer-sz 64,128,256,512,1024,2048 --optype aead --ptest throughput --aead-aad-sz 16 --devtype crypto_openssl --aead-op encrypt --burst-sz 32 --total-ops 10000000 --silent --digest-sz 16 --aead-algo aes-gcm --aead-iv-sz 12 

    lcore id    Buf Size  Burst Size    Enqueued    Dequeued  Failed Enq  Failed Deq        MOps        Gbps  Cycles/Buf 

           1          64          32    10000000    10000000           0           0      5.5482      2.8407      180.24 

           1         128          32    10000000    10000000           0           0      5.0311      5.1518      198.76 

           1         256          32    10000000    10000000           0           0      4.1310      8.4603      242.07 

           1         512          32    10000000    10000000           0           0      3.5078     14.3677      285.08 

           1        1024          32    10000000    10000000           0           0      2.5100     20.5618      398.41 

           1        2048          32    10000000    10000000           0           0      1.5984     26.1889      625.61  


Test AES-CTR/AES-CMAC performance with a single core using crypto_openssl:

Shell
 
cd build/app 

./dpdk-test-crypto-perf --socket-mem 2048,0 --legacy-mem --vdev crypto_openssl -l 0,1 -n 8 -- --buffer-sz 64,128,256,512,1024,2048 --optype cipher-then-auth --ptest throughput --auth-key-sz 32 --cipher-key-sz 16 --devtype crypto_openssl --cipher-iv-sz 16 --auth-op generate --burst-sz 32 --total-ops 10000000 --digest-sz 12 --auth-algo aes-cmac --cipher-algo aes-ctr --cipher-op encrypt 

    lcore id    Buf Size  Burst Size    Enqueued    Dequeued  Failed Enq  Failed Deq        MOps        Gbps  Cycles/Buf 

           1          64          32    10000000    10000000           0           0      3.7295      1.9095      268.14 

           1         128          32    10000000    10000000           0           0      3.1749      3.2511      314.97 

           1         256          32    10000000    10000000           0           0      2.4479      5.0133      408.51 

           1         512          32    10000000    10000000           0           0      1.7072      6.9927      585.75 

           1        1024          32    10000000    10000000           0           0      1.0658      8.7312      938.24 

           1        2048          32    10000000    10000000           0           0      0.6075      9.9540     1645.97 


Performance Scaling With Core Counts

The crypto throughput on the Ampere processor is linear with the core count. Here is an example of AES-GCM-128 throughput with Buffer size=1024 at different core counts on Altra Q80-30:

Core Count Throughput (Gbps)
1 15.41
2 31.12
4 61.91
8 124.18
16 248.10


And snow3g-uea2 cipher-only throughput with Buffer size=1024 at different core counts:

Core Count Throughput (Gbps)
1 5.65
2 11.31
4 22.60
8 45.28
16 90.31


Run l2fwd With Crypto

DPDK provides an example application, l2fwd-crypto, which can do L2 forwarding with crypto. To perform this test, please follow the DPDK-setup-and-tuning-guide and set up the Pktgen-dpdk as a packet generator.

Forwarding with AES-GCM-128bit crypto, 1 port, 1 core, pktsize=1024B on Altra Q80-30:

Shell
 
./build/l2fwd-crypto -l 10-15 -n 8 -a 0000:01:00.0 --vdev crypto_openssl -- -p 0x1 --chain AEAD --aead_op ENCRYPT --aead_algo aes-gcm  -T 1  
Statistics for port 0 ------------------------------ 
Packets sent:                          1339751 
Packets received:                      1339780 
Packets dropped:                             0 
Crypto statistics ================================== 
Statistics for cryptodev 0 ------------------------- 
Packets enqueued:                      1339780 
Packets dequeued:                      1339751 
Packets errors:                              0 


Forwarding with AES-CBC/SHA1-HMAC crypto, 1 port, 1 core, pktsize=1024B:

Shell
 
./build/l2fwd-crypto -l 10-15 -n 8 -a 0000:01:00.0 --vdev crypto_armv8 -- -p 0x1 --chain CIPHER_HASH --cipher_op ENCRYPT --cipher_algo aes-cbc --cipher_key 00:01:02:03:04:05:06:07:08:09:0a:0b:0c:0d:0e:0f --auth_op GENERATE --auth_algo sha1-hmac --auth_key 10:11:12:13:14:15:16:17:18:19:1a:1b:1c:1d:1e:1f -T 1  
Statistics for port 0 ------------------------------ 
Packets sent:                           869828 
Packets received:                       869856 
Packets dropped:                             0 
Crypto statistics ================================== 
Statistics for cryptodev 0 ------------------------- 
Packets enqueued:                       869856 
Packets dequeued:                       869828 
Packets errors:                              0 


Tuning Guide

Hardware Configure

  • 1 DIMM Per Channel memory population is recommended.

BIOS Settings

  • Advanced -> ACPI Settings -> Enable CPPC [Disabled]
  • Advanced -> ACPI Settings -> Enable LPI [Disabled]
  • Chipset -> CPU Configuration -> ANC mode [Monolithic]
  • Chipset -> CPU Configuration -> SLC Replacement Policy [Enhanced Least Recently Used]
  • Chipset -> CPU Configuration -> L1/L2 Prefetch [Enabled]
  • Chipset -> CPU Configuration -> SLC as L3$ [Disabled]

OS Settings

  • Set the frequency governor to performance mode.
  • Use a proper GCC that supports Altra or AmpereOne and recommended build options.
    • Reference: https://amperecomputing.com/tutorials/gcc-guide-ampere-processors
  • Set Hugepage. Example on CentOS with 64k kernel page:
    • echo 100 > /sys/devices/system/node/node0/hugepages/hugepages-524288kB/nr_hugepages

Library Version Selection

  • Check out the latest library code for AArch64cryptolib, ipsec-mb.
  • Use OpenSSL library with version ≥ 3.2.0 or 1.1.1 on Ampere Altra and version ≥ 3.4.0 on AmpereOne.
  • DPDK version ≥ v24.07 is recommended for the AmpereOne family.

References

  • https://amperecomputing.com/tuning-guides/DPDK-setup-and-tuning-guide
  • https://amperecomputing.com/tutorials/gcc-guide-ampere-processors
  • https://doc.dpdk.org/guides/cryptodevs/armv8.html
  • https://doc.dpdk.org/guides/cryptodevs/openssl.html
  • https://doc.dpdk.org/guides/cryptodevs/snow3g.html
  • https://doc.dpdk.org/guides-16.04/sample_app_ug/l2_forward_crypto.html

Check out the full Ampere article collection here.


Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook