Akka and CUDA

DZone 's Guide to

Akka and CUDA

· Java Zone ·
Free Resource

Let’s continue our discussion of native components that react to messages from Akka. We will wire in actual image processing to our C++ code. To make the matters even more interesting, I’ll show you how to use the CUDA-enabled build of OpenCV.

The main components

We have three main components: the JVM-hosted ActorSystem with a number of actors that send requests for image processing over RabbitMQ to the native application that binds to the appropriate queues.

The other important detail is that the Akka application sends one request, but receives a stream of responses. In the code I am showing at https://github.com/janm399/akka-patterns, the responses are all the same, but one can easily imagine that the native component begins processing a video stream and only sends interesting sequences to the client. (This could be motion detection, unusual human behaviour pattern, industrial process monitoring, …).

Why native, you ask? Well, to show that such integration is indeed possible and to allow me to write code that makes the most of your GPU. (Did you know that you can get 1 tera-FLOPS machine from Amazon for $2.40 an hour?!)

The mechanism

RabbitMQ is our message broker. It receives the requests from the Akka code, routes them to the appropriate queue, where the server component picks them up.

  1. ConnectionOwner.createChildActor(connection, Props(new RpcStreamingClient())) creates
    an actor that will talk over RabbitMQ. Each actor creates a private, exclusive queue that it uses
    for the responses.
  2. channel->BindQueue("image", "amq.direct", "image.key"); binds a listener
    to the queue. It then obtains a delivery tag and uses it to synchronously wait for a
    message to arrive by calling Envelope::ptr_t env = channel->BasicConsumeMessage(tag);
  3. client ! Request(Publish("amq.direct", "image.key", ...))
    The client sends some payload to the amq.direct exchange and specifies
    the image.key routing key. RabbitMQ examines the routing key and places
    the request on the queue identified by the exchange routing key.
  4. The channel->BasicConsumeMessage(tag) completes and returns the message that the
    client published.
  5. M*a*g*i*c & p*o*n*i*e*s
  6. The server constructs a response and sends it to the client by calling channel->BasicPublish("", replyTo, response, true);

The native code

We begin by looking at the native component. It is a fairly straight-forward cmake.org business, that requires the RabbitMQ C and C++ clients, Boost and OpenCV. The first three are plain-vanilla installations, even on OS X. The OpenCV build needs a little bit of fiddling to make it work with CUDA on OS X. (If you are on Linux, skip the next section; if you are on Windows, I pity your soul.)


Download & install the latest CUDA packages. Then download the OpenCV sources. To build it, follow the usual cmake recipe. Create a build directory and change into it. Before building, edit cmake/OpenCVDetectCUDA.cmake by commenting out the return() in

   message(STATUS "CUDA compilation was disabled (due to...).")
   # return() <-- commented

Next, you must tweak the generated CMakeCache.txt (shudders!) by changing the CMakeCache.txt:CUDA_HOST_COMPILER:FILEPATH property from /usr/bin/cc to /usr/bin/gcc


Next, complete the whole sorry business by the usual cmake build and install incantations

Building daemon

To build the daemon CUDA project, go to its source and run

daemon/src$ mkdir build
daemon/src$ cd build
daemon/src/build$ cmake -G Xcode ..    # to genate XCode project
daemon/src/build$ cmake ..             # to generate makefiles
daemon/src/build$ cmake --build .

Before executing it, make sure that you have the image queue conigured and that there is a routing from the amq.direct exchange using the image.key routing key.

Now you’re ready to run it.

The Akka/Scala code

The Akka & Scala code uses the AMQP client from https://github.com/janm399/amqp-client. Build that first; then you’re ready to use build & use the Akka Patterns code.

The notable piece of code is the ClientDemo, which establishes connection to the RabbitMQ broker, creates 16 clients and then sends the Request message to each of them. This kicks off the C++ component, which then starts pushing the resonses to each of the clients until the client decides that it’s had enough (which takes 100000 ms) and stops itself. This causes the C++ server to receive an exception on sending, which is an indication to it that it should go back to waiting for another message from another client.

Showing off

Making most of the GPU gives you amazing number-crunching speed. I ran this codebase on the latest & coolest MacBook Pro with GeForce GT 520M and the CUDA build procesed approximately four times as many messages, while the CPU load remained fairly low. Your results, of course, may vary.

CPU processing:

GPU processing:


This whole structure shows just how easy it is to build a complex native system that squeezes out as much out of your hardware as possible. The RabbitMQ message broker allows us to easily scale the message rates (this is for another post!), it also ensures safe-ish message delivery and re-delivery in case of failures.

P.S. If you have a lot of images / data that you need to crunch, get in touch. I don’t know why, but I smell a cool project!

P.P.S. A very warm “hi” to my colleagues from 12 years ago! (And yes, Eastern European programmers drink. A lot. And often.)



Published at DZone with permission of Jan Machacek , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}