Intelligence Is Prediction: Robots Imagine the Future
Intelligence Is Prediction: Robots Imagine the Future
Children play with objects as soon as they get minimal control of their arms. They learn how to manipulate objects, arms, and goals as a unified lesson. This is a promising approach for robots, too.
Join the DZone community and get the full member experience.Join For Free
Insight for I&O leaders on deploying AIOps platforms to enhance performance monitoring today. Read the Guide.
On December 5, at the Neural Information Processing Systems conference in Long Beach, California, a group of researchers from the University of California Berkeley presented some results (and a live demonstration) for a project that involves robots learning hand-eye coordination and using that coordination to accomplish goals such as "move object A to position X."
The interesting thing about this work is that it does not involve data descriptions or movement strategies. It simply involves "playing,"much like a child might play with toys. Much as children babble with sounds before they can master speech, children also "babble" with physical movements. As soon as a child has even the most rudimentary ability to move its tiny arms and hands, it clumsily grasps and ineffectually jostles things about, watching all the time — and learning.
So, the researchers at BAIR (Berkeley Artificial Intelligence Research) implemented a system where a robot could play/babble with a wide range of objects. All the while, it watched, and with the use of a deep neural net, it gathered enough data to learn what would happen when different objects were jostled in a variety of ways. The robot could predict what probably would happen when some object B that looks somewhat like object A (which it has interacted with before) is jostled with an array of maneuvers that the robot has tried before.
And the system doesn't just learn to abstractly imagine what might happen — it can construct a future video of what will happen. It actually imagines what might happen. You might even say it dreams the future.
These dreams are only a second or two into the future, but that's all you need to accomplish the goal of moving object A to position X. The robot only needs to look at all of its dreams (from all its previous jostles) and pick the one that gets closest to X. Looping through the behavior eventually gets A to X. No one programmed it how to do that. It learned it all by itself.
The BAIR Research Team calls this technique Visual Foresight (this site has some interesting videos demonstrating how the system works). What is particularly interesting about this work is that the robot bootstraps all of its own personal world model. In fact, a different robot might learn a slightly different but largely similar world model — just like people do.
(I'm betting this will be an interesting time for raising baby robots.)
Sergey Levine, assistant professor at BAIR, says:
"In the same way that we can imagine how our actions will move the objects in our environment, this method can enable a robot to visualize how different behaviors will affect the world around it..." He added: "This can enable intelligent planning of highly flexible skills in complex real-world situations."
It's interesting to note that in order to gather enough knowledge about the objects and its own arms and motor control, it had to play/babble for about a week. That seems to be about the same order of magnitude of time that a child needs learn how to crudely move things about in its world model. Extrapolating, I wonder if, after presenting our robot with a lot of baby robot toys, would we eventually have to provide our robot with a lifetime of experiences?
The specific technology behind the system is convolutional recurrent video prediction, which is also called dynamic neural advection (DNA).
(If you like to tinker with these things, you can find an interesting project here.)
The idea behind DNA is the prediction of individual pixels. In this case, the robot has cameras watching the arm and the objects that it interacts with. If you recall, the arm is physically "babbling" with the objects in its field. Whenever the arm interacts with an object, the subsequent video frames combined with information about where on the object is and from what angle it is approached are added to the dataset. After collecting many thousands of instances of these interactions, the system can be trained to predict where the pixels from the object will move to in subsequent frames. This trained system can actually produce a video of what is most likely to happen one or two seconds into the future. Using object detection on the imagined future video leads to a quantifiable measure of the object's movement. The new predicted object position can easily be used in an iterative process to move the object to any desired (goal) location.
Chelsea Finn, a doctoral student in the lab who created the original DNA model, notes:
"In the past, robots have learned skills with a human supervisor helping and providing feedback. What makes this work exciting is that the robots can learn a range of visual object manipulation skills entirely on their own."
Sergey Levine, Assistant Professor in Berkeley's Department of Electrical Engineering and Computer Sciences, whose lab developed the technology, describes it like this:
"In the same way that we can imagine how our actions will move the objects in our environment, this method can enable a robot to visualize how different behaviors will affect the world around it. This can enable intelligent planning of highly flexible skills in complex real-world situations."
"Children can learn about their world by playing with toys, moving them around, grasping, and so forth. Our aim with this research is to enable a robot to do the same: to learn about how the world works through autonomous interaction. The capabilities of this robot are still limited, but its skills are learned entirely automatically, and allow it to predict complex physical interactions with objects that it has never seen before by building on previously observed patterns of interaction."
Frederik Ebert, a graduate student member of the project, adds:
"Humans learn object manipulation skills without any teacher through millions of interactions with a variety of objects during their lifetime. We have shown that it possible to build a robotic system that also leverages large amounts of autonomously collected data to learn widely applicable manipulation skills, specifically object pushing skills."
The Berkeley laboratory is continuing to research learning based on video prediction including better ways for the robot to watch what it's doing as well as how to manipulate less rigid objects (such as fabric).
It seems like robots are becoming more like us every day.
Opinions expressed by DZone contributors are their own.