Playing Scavenger Hunt in a Facebook Virtual Home?
Playing Scavenger Hunt in a Facebook Virtual Home?
Facebook already pokes around in your data, why not let it rummage around your house? Is this a Roomba only smarter and with more flexible goals? Here's the scoop.
Join the DZone community and get the full member experience.Join For Free
If robots are going to wander autonomously around our house then they better have a minimum of common sense so as to avoid bumping into tables, tripping over slippers, not stepping on your dog Spot (or Amazon spot), etc.
In today's spirit of "let's gamify everything" Facebook has been working with Georgia Tech to devise an in-home navigation challenge under the guise of a scavenger hunt. In this first iteration of the game, a home is simulated in a virtual 3-D model which has normal but randomly arranged objects one would find in the home. The scenario is for a human to present the game world with a natural language query and then for the AI game agent to provide a plausible and correct response.
Me: "Where are my brown shoes?"
Agent: "Next to the ottoman in the family room."
Obvious next steps for such a system would be to query a real robot in your real home: "find my brown shoes" and expect the robot to bring them to you.
This contest was developed by Devi Parikh, a computer scientist, and Dhruv Batra (her husband), working at Georgia Tech and the Facebook AI Research group (FAIR). About this first phase of the project, Devi said: “The goal is to build intelligent systems that can see, talk, plan, and reason.” Devi, Dhruv, and their team constructed an agent that builds on a couple different forms of machine learning in the context of the domain of things that are in a home.
Part of the plan is to learn a practical set of probabilities for where to find things. As it executes its tasks, it uses this "common sense" to improve the efficiency with which it finds things. For example, the agent might initially search at random throughout the house to find a "frying pan," but once it does find the frying pan (most probably in the kitchen), it will remember/learn. If it is asked for a frying pan in the future, the agent will start the search in the kitchen. Similarly, finding slippers might lead to two possible locations such as the bedroom or the family room, but never in the garage. So, this simple grid of objects versus locations with associated probabilities is a simple common sense that the machine can learn using simple reinforcement learning techniques. Finding the object creates the reinforcement and the object-location probability is increased.
An agent like this could also behave more intelligently more quickly by taking advantage of simple ontologies. Once the agent learned that a frying pan was in the kitchen, then it would additionally use the ontology to understand that a frying pan is a kind of cookware. Then it can increase kitchen location probabilities to all the other things that are considered cookware. While both of these learning techniques are very basic, it isn't too hard to see how quickly the agent could acquire a lot of practical working knowledge about your specific house.
For this game/contest, the virtual homes were created by researchers at FAIR and UC Berkeley. This research was highlighted during the recent Facebook annual developer conference.
Quite a few research groups are experimenting with virtual environments for training AI, and in some ways, the recent research on playing video games is very similar since the player (agent) operates in a well-defined virtual realm and gathers knowledge and via reinforcement learning (using gathered points as a reward function). An interesting video about learning to play Mario is located here. One thing that the video game learning AI and the scavenger hunt have in common is that they are dynamic since the agent moves through the environment (albeit virtual) and learns from its actions. It learns about things happening in time as opposed to most of the AI we are familiar with today, which involves identifying or classifying a static collection of input data (e.g. looking at still photos and finding the puppies or deciding if people look happy).
It's becoming more obvious to researchers that agent behavior even in these toy environments is too complicated to hand code to all of the possible world states. These systems will need to accumulate knowledge on their own. Microsoft has a virtual world called Malmo, which is based on the game of Minecraft, which is being used by some researchers today.
Also, the Allen Institute for AI (Ai2) has developed its own 3-D virtual environment/playground for AI agents: AI2-THOR.
The primary researcher at Ai2, Roozbeh Mottaghi, stresses that it is crucial "for these virtual environments to become more realistic if we want AI agents to learn properly". Roozbeh points out that it is a labor-intensive task to design “a single realistic-looking room might take months, and it is costly. And defining realistic physical properties for every object is very challenging.”
Of course, we still have the incredibly complex problem of completely and correctly understanding the input language from the human. It is still the pioneering days of talking with computers as naturally as we talk with people. Facebook is painfully aware of that following the lackluster performance and poor acceptance of its virtual assistant "M" (even though it used humans behind the scene to help M along). But the conversational problem is too big to address here and now, I'll be doing that in the series of articles in the future.
It's comforting to know that many of these navigational problems may be addressed in the near future. So, I confidently expect someday soon to open up the box of my brand-new Roomba vacuum, turn it on, and tell it to follow me around the house while I give it a tour. And I fully expect to be able to tell it to "go and vacuum the back bedroom at 10 AM tomorrow," and while it's going about its daily tasks, I expect to learn where my slippers are and what my dog looks like.
Opinions expressed by DZone contributors are their own.