Accelerating ML Inference on Raspberry Pi With PyArmNN
Accelerating ML Inference on Raspberry Pi With PyArmNN
This tutorial shows how to use the newly released Python APIs for Arm NN inference engine to classify images as “Fire” versus “Non-Fire.”
Join the DZone community and get the full member experience.Join For Free
A neural network, trained to recognize images that include a fire or flames, can make fire-detection systems more reliable and cost-effective. This tutorial shows how to use the newly released Python APIs for Arm NN inference engine to classify images as “Fire” versus “Non-Fire.”
What Is Arm NN and PyArmNN?
Arm NN is an inference engine for CPUs, GPUs, and NPUs. It executes ML models on-device in order to make predictions based on input data. Arm NN enables efficient translation of existing neural network frameworks, such as TensorFlow Lite, TensorFlow, ONNX, and Caffe, allowing them to run efficiently and without modification across Arm Cortex-A CPUs, Arm Mali GPUs, and Arm Ethos NPUs.
PyArmNN is a newly developed Python extension for Arm NN SDK. In this tutorial, we are going to use PyArmNN APIs to run a fire detection image classification model fire_detection.tflite and compare the inference performance with TensorFlow Lite on a Raspberry Pi.
Arm NN provides TFLite parser armnnTfLiteParser, which is a library for loading neural networks defined by TensorFlow Lite FlatBuffers files into the Arm NN runtime. We are going to use the TFLite parser to parse our fire detection model for “Fire” vs. “Non-Fire” image classification.
You might also like: Python REST API Example (With Microservices) — Part 1
What Do We Need?
- A Raspberry Pi. I am testing with a Raspberry Pi 4 with Raspbian 10 OS. The Pi device is powered by an Arm Cortex-A72 processor, which can harness the power of Arm NN SDK for accelerated ML performance.
- Before you proceed with the project setup, you will need to check out and build Arm NN version 19.11 or newer for your Raspberry Pi. Instructions are here.
- PyArmNN package
- fire_detection.tflite, generated from this tutorial and converted to a TensorFlow Lite model.
Run ML Inference With PyArmNN
To run an ML model on device, our main steps are:
- Import pyarmnn module
- Load an input image
- Create a parser and load the network
- Choose backends, create runtime, and optimize the model
- Perform inference
- Interpret and report the output
Import pyarmnn Module
Our model is a floating-point model. We must scale the input image values to a range of -1 to 1.
Create a Parser and Load the Network
The next step when working with Armn NN is to create a parser object that will be used to load the network file. Arm NN has parsers for a variety of model file types, including TFLite, ONNX, Caffe, etc. Parsers handle the creation of the underlying Arm NN graph so you don't need to construct your model graph by hand.
In this example, we will create a TfLite parser to load our TensorFlow Lite model from the specified path:
Get Input Binding Info
Once created, the parser is used to extract the input information for the network.
We can extract all the input names by calling GetSubgraphInputTensorNames() and then use them to get the input binding information. For this example, since our model only has one input layer, we use input_names to obtain the input tensor, then use this string to retrieve the input binding info.
The input binding info contains all the essential information about the input, it is a tuple consisting of integer identifiers for bindable layers (inputs, outputs) and the tensor info (data type, quantization information, number of dimensions, total number of elements).
Choose Backends, Create Runtime and Optimize the Model
Specify the backend list you can optimize the network.
Load Optimized Network Into the Runtime
Load the optimized network in the runtime context. LoadNetwork() creates the backend-specific workloads for the layers.
Get Output Binding Info and Make Output Tensor
Similar to the input binding info, we can retrieve from the parser the output tensor names and get the binding information.
In our sample, it is considered that an image classification model has only one output, hence it's used only the first name from the list returned, and it can easily be extended to multiple output looping on the output_names.
Performance Inference EnqueueWorkload() function of the runtime context executes the inference for the network loaded.
Here is our full Python code predict_pyarmnn.py:
Run the Python Script From the Command Line:
In our example, class 0’s possibility is 0.9967675, vs. class 1’s possibility is 0.00323252, fire is not detected in the image.
PyArmNN vs TensorFlow Lite Performance Benchmarking
As the next step, we benchmark PyArmNN and TensorFlow Lite Python API's performance on a Raspberry Pi.
TensorFlow Lite uses an interpreter to perform an inference. The interpreter uses a static graph ordering and a custom(less-dynamic) memory allocator. To understand how to load and run a model with Python API, please refer to the TensorFlow Lite documentation.
For performance benchmarking, inference was carried out with our fire detection model. In our example, we only run inference once. We can also run the model multiple times and take the average inferencing time. Here is our predict_tflite.py code:
We extend predict_pyarmmnn.py with the same code for inference benchmarking.
Run the Python script again:
From the result, you can observe an inference performance enhancement by using PyArmNN APIs.
Next Steps and Helpful Resources
This tutorial shows how to use the Arm NN Python APIs to classify images as “Fire” versus “Non-Fire.” You also can use it as a starting point to handle other types of neural networks.
To learn more about Arm NN, check out the following resources:
Opinions expressed by DZone contributors are their own.