AI World: What About Hardware?
AI World: What About Hardware?
AI comes in all sizes: immense production scale (exabyte oceans of data), developer workstation scale (sandboxes), and IoT embedded scale (yeah, even Raspberry Pi size).
Join the DZone community and get the full member experience.Join For Free
Insight for I&O leaders on deploying AIOps platforms to enhance performance monitoring today. Read the Guide.
While December's AI world conference was not primarily a hardware-based affair, it's hard to think about all of the potential machine learning applications and software without considering what it's going to run on. Also, there are the distinct categories of hardware platform.
Algorithm and Model Development
This is where you play with smaller data sets and think about how to organize and structure of the input data. (Note: this is often a very subtle and artful task.) In addition you make decisions about what type of machine learning will be applied. These days in the case of very large and complex problems this means some version of a deep Convolutional Neural Net (CNN). Those of you in the high priesthood of AI would create, modify, or the experiment with the actual source code that constructs these neural nets. For the simpler problems with smallish data sets this platform can be a fairly standard multicore desktop with enough RAM to hold your problem in memory. But to play at the higher end of the spectrum you will need significantly more power. Fortunately several vendors are working on specialized workstations configured especially for machine learning. Much like the demand for CAD/CAM engineering workstations drove specialized interactive graphics configurations two decades ago, the demands of AI development are driving the market for AI focused desktops. Dell was one of the exhibitors at the AI World conference and demonstrated some fully tricked out Dell/Nvidia systems that would fit under your desk (and most certainly keep your feet warm too). These systems fit into a conventional PC box but are loaded with CPUs that have a dozen or more cores as well as one, two or even three Nvidia cards (Teslas or Quadros), and lest we forget 256 GB of RAM (or even more with the new crossbar memristor boards).
Of course, there's a lot of overlap in each of these categories, but once you have a development version of your system you will want to train it with a significantly larger data set. Usually, this would be a small rack-based system that still fits within the lab context. Although at this point in the progression there is a significant consideration about how sensitive the data is and where it resides. If the data is not particularly sensitive (medical or financial) and is already available to the world at large then you might consider migrating to a cloud-based solution at this point. But for any number of reasons and institutions may not be willing to copy all of their data up to a server site nor would they be willing to make ready access to that data over the Internet. Regardless of whether you do this sandbox training in the cloud or in the lab, you will want to be able to demonstrate that the performance of the neural net improves significantly with more training data. You will want to explore and examine all of the idiosyncrasies of your new clever creation: false positives, false negatives, are categories in your data set underrepresented, etc.
Production training is only a distinct next step if you have truly huge amounts of data (exabytes). If your decision is to do production training in the cloud then the remaining consideration is how often will you want or need to retrain? If you don't expect your training data to change very much and you can get away with training every few months, then it will be cost-effective to rent compute cycles from one of the many vendors (e.g. AWS). Note: if the cloud environment is not exactly compatible with your code from the sandbox cycle, then you will have some code rewriting that will force you to do more quality control to make sure you haven't changed the logic of your underlying system in some way. If your training data is not static (or nearly static) you will need to train more often. Obviously if you're modeling stock prices then you must retrain regularly (perhaps daily or even hourly). So, if you are data is very dynamic and sensitive or proprietary you will need to seriously consider assembling your own dedicated AI data center.
Finally, we can address the issue of ML training versus ML execution. Neural net technology is loosely based on biological learning and our naïve and primitive understanding about how a brain works, it's not surprising that learning a skill takes a great deal more time and energy than it does to perform an already learned skill. Learning to read or to ride a bike seems challenging and takes a lot of concentration at the time, but after they are learned they are seemingly effortless. Executing the training is much easier than the training was.
Many of us have heard the tales of deep CNNs being trained on exabytes of data using millions of computing cores and taking hundreds of hours using thousands of kilowatt hours of electricity to to be able to distinguish between pictures of blueberry muffins and Chihuahuas. (Clearly, we haven't quite figured out how real brains work because the human brain uses about 20 W of power and can figure out the dog muffin quandary in a few seconds. Deep CNNs have a ways to go.)
But I digress, once all the heavy computational lifting of training is done the much lighter load of using that training is what you have to support. Most of us remember solving multiple linear equations with multiple variables back in algebra class. Solving for all of the coefficients for the predictive equation took some time (if there were a large number of variables it took a lot of time). But, once you had that predictive equation (a.k.a. the solution) it was much easier to plug in new input parameters and instantly get a predictive result. (Note: Previous generations' AI were often based on the simplex method, which is a more generalized algorithm for solving multiple linear equations and deals with approximate solutions.) Another excellent example of the asymmetry in computing between training and execution is in the camera functionality of almost everyone's phone. A lot of data and computing time was put into training an algorithm to recognize faces. That algorithm can easily execute as a background process in real time identifying multiple faces.
The point is that the vast majority of the expense for these amazing machine learning systems is in discovering the approach to the problem coupled with the actual training with huge data sets. I think of it as paying for college. (But now at cocktail parties we can all easily pontificate about Baudelaire. predictive branching, mitosis, etc. ad nauseam.)
At the exhibition, there were examples of systems and hardware that could tackle the spectrum of these problems. One of the corporate hosts of AI World was IBM Watson which could provide the racks of hardware necessary to run big problems. Watson provided a range of higher-level tools designed to digest raw input (e.g. technical journal articles) into a form useful for identifying patterns and correlations, and ultimately being able to generate responses to queries against that input data. We all remember what a good job Watson did while playing Jeopardy. IBM Watson represented the "many racks of multicore computers on site" type of solution. This approach assumed you would be feeding input data on a regular basis to a production scale system that can be interrogated in a regular and ongoing fashion.
Another hardware/cloud company at the conference was DataRobot which provides a platform for which a wide range of open source AI algorithms and supporting systems are in place and ready to use.
At the other extreme were purely executable solutions such as Affectiva that solved a specific and unchanging domain. The Affectiva product is a vision system that identifies facial emotions in real time. All of the system learning was done by the company and that learning could be executed on a lightweight platform. In fact, this company could execute their algorithm on a Raspberry Pi.
There were other companies that provided systems ready to learn specific things (e.g. index raw video/audio). But that topic requires a deeper dive and another post (I'll keep you posted?). So until then...
Opinions expressed by DZone contributors are their own.