AI on a Stick: Tensor Math on USB Gives You the Edge
AI on a Stick: Tensor Math on USB Gives You the Edge
Serious ML compute power in a USB dongle that plugs into your Raspberry Pi. It won't replace a GPGPU server farm, but it will make your small ML project 30X faster.
Join the DZone community and get the full member experience.Join For Free
Recently, there was a news announcement that Google will be offering a small development board for its small highly efficient Tensorflow augmenting TPU chips. Google has been building specialty processing boards for the last couple years to add to the power of their existing cloud computing infrastructure. These Tensor Processing Units (TPU) are a step up from the GPGPUs they replace in both power consumption and compute power. They're designed to do exactly the kind of matrix (tensor) math that is the foundation for Google's tensor flow software. Earlier this year, I wrote an article about how Google was making the functionality of these devices available in their public cloud computing offerings. I must say I was surprised by how quickly Google made their core computing chip available in a small board based development kit. They announced that it would be available sometime in the fall of 2018. Just around the corner!
The silicon heart of this beast is surprisingly small:
Google promises some surprising performance statistics. The press release states that the "Edge TPU enables users to concurrently execute multiple state-of-the-art AI models frame by frame, on a high-resolution video, at 30 frames per second, in a power-efficient manner." They went further to state that they "were hyperfocused on optimizing for 'performance per watt' and 'performance per dollar' within a small footprint." And from an early peek, it appears that the entire development board will be about the size of a Raspberry Pi:
From the looks of the board (and this is only my personal best guess), it probably draws a similar amount of power as the Raspberry Pi. This guess based solely on the size of the potential power connectors and the lack of any visible heatsinks.
So we should expect to see a lot of products next year that embody a great deal more intelligence. For example today the camera app on your phone can dynamically find faces, but in the near future, it may be able to cue you to say "everybody smile" because it will detect people as well as how happy they look. Maybe it would learn the faces of your friends and family and be able to say things like "C'mon, smile uncle Robert."
One thing I would really like to see is a serious upgrade to on-device speech recognition. Everyone knows that speech recognition on the web is quite good and works well for informational queries, but if you talk to one of these devices in an extended conversation, the latency of the recognition becomes apparent and seriously impedes the natural flow conversation. And while the general models for recognition are quite good across a range of accents and voice types, it would certainly be an improvement to have the system learn and adapt to your personal characteristics. For the record, some speech recognition is being done on the device these days. For instance, the android wake-up word, "Okay Google," is done with local processing. But some other functions that use a highly constrained vocabulary seem to work well (e.g. "Set alarm for 3 PM" is done locally on my android phone). You can easily test what works and what doesn't work on your phone by turning off the wireless and cell connectivity and giving it a try. Something as wide-open and far-ranging in vocabulary as a dictated text message is not supported today, but tomorrow with lots more memory and a few number-crunching chips this might become the norm.
These chips should also help make autonomous vehicles more affordable as well as more intelligent since we could use a few of these TPU chips to replace an array of the power-hungry GPGPUs boards that current self-driving cars require.
Before I go off the deep end with my hopes and expectations, I would like to provide a little bit of a reality check. First of all, while the chips themselves will do a pretty good job at the task of matrix manipulation, they will still suffer from the nemesis of all Machine Learning algorithms: the data movement bottleneck. Even for TPUs and GPGPUs that are designed into specialized architectures on boards that plug directly into PCI high-speed backplanes, this remains a problem. The movement of data and models onto and off of the processing board is still the weakest link in the process. So in the context of the Raspberry Pi, we have to remember that the data is going through a standard USB protocol. And to make it a little worse, on the Raspberry Pi, the functionality of USB protocol shares hardware and compute cycles with the Ethernet network protocol, so it's not the fastest data pipe. But nonetheless, Google suggests that this USB based TPU has accelerated the frame rate processing speed for image analysis from the 1 to 2 frames per second possible with a standalone Raspberry Pi to something in the vicinity of 30 frames per second using the TPU. Definitely a sizable and usable upgrade in processing power. At 30 frames per second, it should make low latency real-time applications possible!
It even looks pretty cool. I'll be waiting at the mailbox for mine to arrive.
Opinions expressed by DZone contributors are their own.