Akka and CUDA
Akka and CUDA
Join the DZone community and get the full member experience.Join For Free
Let’s continue our discussion of native components that react to messages from Akka. We will wire in actual image processing to our C++ code. To make the matters even more interesting, I’ll show you how to use the CUDA-enabled build of OpenCV.
The main components
We have three main components: the JVM-hosted
ActorSystem with a number of actors that send requests for image processing over RabbitMQ to the native application that binds to the appropriate queues.
The other important detail is that the Akka application sends one request, but receives a stream of responses. In the code I am showing at https://github.com/janm399/akka-patterns, the responses are all the same, but one can easily imagine that the native component begins processing a video stream and only sends interesting sequences to the client. (This could be motion detection, unusual human behaviour pattern, industrial process monitoring, …).
Why native, you ask? Well, to show that such integration is indeed possible and to allow me to write code that makes the most of your GPU. (Did you know that you can get 1 tera-FLOPS machine from Amazon for $2.40 an hour?!)
RabbitMQ is our message broker. It receives the requests from the Akka code, routes them to the appropriate queue, where the server component picks them up.
ConnectionOwner.createChildActor(connection, Props(new RpcStreamingClient()))creates
an actor that will talk over RabbitMQ. Each actor creates a private, exclusive queue that it uses
for the responses.
channel->BindQueue("image", "amq.direct", "image.key");binds a listener
to the queue. It then obtains a delivery tag and uses it to synchronously wait for a
message to arrive by calling
Envelope::ptr_t env = channel->BasicConsumeMessage(tag);
client ! Request(Publish("amq.direct", "image.key", ...))
The client sends some payload to the
amq.directexchange and specifies
image.keyrouting key. RabbitMQ examines the routing key and places
the request on the queue identified by the exchange routing key.
channel->BasicConsumeMessage(tag)completes and returns the message that the
- M*a*g*i*c & p*o*n*i*e*s
- The server constructs a response and sends it to the client by calling
channel->BasicPublish("", replyTo, response, true);
The native code
We begin by looking at the native component. It is a fairly straight-forward cmake.org business, that requires the RabbitMQ C and C++ clients, Boost and OpenCV. The first three are plain-vanilla installations, even on OS X. The OpenCV build needs a little bit of fiddling to make it work with CUDA on OS X. (If you are on Linux, skip the next section; if you are on Windows, I pity your soul.)
CUDA OpenCV on OS X
Download & install the latest CUDA packages. Then download the OpenCV sources. To build it, follow the usual cmake recipe. Create a
build directory and change into it. Before building, edit
cmake/OpenCVDetectCUDA.cmake by commenting out the
if (NOT MSVC AND NOT CMAKE_COMPILER_IS_GNUCXX OR MINGW) message(STATUS "CUDA compilation was disabled (due to...).") # return() <-- commented endif()
Next, you must tweak the generated
CMakeCache.txt (shudders!) by changing the
CMakeCache.txt:CUDA_HOST_COMPILER:FILEPATH property from
Next, complete the whole sorry business by the usual cmake build and install incantations
To build the
daemon CUDA project, go to its source and run
daemon/src$ mkdir build daemon/src$ cd build daemon/src/build$ cmake -G Xcode .. # to genate XCode project daemon/src/build$ cmake .. # to generate makefiles daemon/src/build$ cmake --build .
Before executing it, make sure that you have the
image queue conigured and that there is a routing from the
amq.direct exchange using the
image.key routing key.
Now you’re ready to run it.
The Akka/Scala code
The Akka & Scala code uses the AMQP client from https://github.com/janm399/amqp-client. Build that first; then you’re ready to use build & use the Akka Patterns code.
The notable piece of code is the
ClientDemo, which establishes connection to the RabbitMQ broker, creates 16 clients and then sends the
Request message to each of them. This kicks off the C++ component, which then starts pushing the resonses to each of the clients until the client decides that it’s had enough (which takes
100000 ms) and stops itself. This causes the C++ server to receive an exception on sending, which is an indication to it that it should go back to waiting for another message from another client.
Making most of the GPU gives you amazing number-crunching speed. I ran this codebase on the latest & coolest MacBook Pro with GeForce GT 520M and the CUDA build procesed approximately four times as many messages, while the CPU load remained fairly low. Your results, of course, may vary.
This whole structure shows just how easy it is to build a complex native system that squeezes out as much out of your hardware as possible. The RabbitMQ message broker allows us to easily scale the message rates (this is for another post!), it also ensures safe-ish message delivery and re-delivery in case of failures.
P.S. If you have a lot of images / data that you need to crunch, get in touch. I don’t know why, but I smell a cool project!
P.P.S. A very warm “hi” to my colleagues from 12 years ago! (And yes, Eastern European programmers drink. A lot. And often.)
Published at DZone with permission of Jan Machacek , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.