Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Building Chatbots With API.AI and GRAKN.AI

DZone's Guide to

Building Chatbots With API.AI and GRAKN.AI

Learn what a chatbot is, how API.AI enables natural language understanding, and about building Graql queries to create a chatbot that can find movies.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

How long do you spend browsing Netflix for something to watch? If you are anything like me, 80% of Saturday night is spent searching for a movie to watch… before deciding, once again, on reruns of your favorite sitcom, simply because you’re tired of searching. And what if you want to find things beyond what Netflix offers? You’d have to deal with IMDB or Rotten Tomatoes, which are just plain frustrating to search.

My solution: Create a chatbot that finds movies and provides information about them to me. Ideally, I should be able to message the chatbot from my phone — there’s nothing more appealing than talking to a robot while waiting for the tea to brew.

Before delving into the specifics of what I built, let us first quickly review some basics about chatbots, and explain why our work here at GRAKN.AI is relevant.

What Is a Chatbot and Why Do You Want One?

Chatbots are programs that simulate talking to a human. They are usually designed with a goal in mind — in our case, finding movies.

There are a few components needed to build a chatbot:

  1. Natural language processing — decomposing input sentences into the constituent parts.
  2. Query generation — translating parsed sentences into valid queries.
  3. Context management — remembering the results of previous queries to answer futures ones intelligently.
  4. Domain knowledge — storing knowledge about the domain of the chatbot’s expertise.
  5. Natural language generation — giving the chatbot a personality.

Backends for Chatbots

Finding the right combination of tools to build a chatbot can be challenging.

There are, of course, many chatbot platforms that already exist. These tend to be bot-as-a-service platforms where you can build, adapt, and deploy your service in the cloud. Some options to take a look at include the Microsoft Bot Framework, WIT.AI, Pandorabots, and API.AI. All provide a different set of features and integrations and a different level of usability.

Chatbots often require a database backend, so you will face another set of choices. Your options range from the standard SQL databases, whose structure is often not compatible with natural language, or the more accessible NoSQL databases. You can also choose to use a graph backend, as we did. Graph database query languages can often easily mimic the connections present in natural language, which makes them ideal for building chatbots.

GRAKN.AI is a knowledge graph platform. It is a database in the form of a knowledge graph that uses machine reasoning to simplify data processing for AI applications. Querying is performed through Graql, a declarative, knowledge-oriented graph query language for retrieving explicitly stored and implicitly derived information, and for performing graph analytics and automated reasoning.

To create a movie recommendation chatbot, I needed a graph full of movies, and my luck held out; most of the data processing had already been completed. Last spring, we built Moogi, a knowledge-graph-powered semantic search engine. We later extracted the top 1,000 movies and their related concepts to use as a demo data set. It was a simple matter to load the data into a Grakn graph (and I was overjoyed not to have to spend hours cleaning data).

We next needed to decide on the component that would do the natural language processing.

API.AI Enables Natural Language Understanding

While I’ve used CoreNLP and OpenNLP for parsing in the past, API.AI — by far — blew them away in terms of both accuracy and usability. API.AI provides tools to maintain contexts and allows you to add custom responses when queries fail. When you take into account all of the applications they interface with by default, such as Alexa and Slack, it was a very easy choice. As a side note, they have done an absolutely wonderful job with their user interface and documentation.

API.AI communicates with GRAKN.AI using a webhook that contacts a simple REST endpoint. We spun up this endpoint using Java Spark and hosted it on one of our in-office servers.

Building the Moviebot

Searching for a Movie

Grakn schemas already describe instances and how they relate, so all we needed to do was export the movie ontology to a format that API.AI would understand.

API.AI has the concept of entities, which are used to extract values from the natural language input. Each API.AI entity would represent one type in a Grakn ontology.

We also need to teach API.AI what resources we have in the graph, so I created a resource-value entity containing all of the — you guessed it — resource values we have in the movie graph.

Populating the “resource-value” entities with all values from the movie grap.

I ended up using API.AI’s batch upload feature because there are more than 30,000 resource values in the graph.

The most simple movie search a user would do is based on a single entity. Take genre — one could say “romantic comedies” or “horror movies.” The API.AI search intent needs to be able to extract two things, the type you are searching for and the parameters (roles and instances) by which to frame the search.

A more complicated query would include the user asking for movies parametrized by a role — for example, “movies composed by Hans Zimmer.” In this case, we need to represent the combination of two Grakn types (the instance and the role). The best way to do that in API.AI is to create a composite entity, which translates to an object in the JSON result.

Creating the “search” composite entity — searches can be made up of resource values and roles.

When you create an intent in API.AI, it will automatically annotate your input based on the correspondence between the text and the entities you have created. Sometimes, they get it right — other times they get it wrong or completely fail to annotate anything. The more examples you provide, the better their machine learning component gets at recognizing the entities in your queries. To this end, I provided about 100 samples for each of the two intents in our agent.

Let’s see how this plays out in a search intent. We used the basic queries from before (“Horror movies” and “movies composed by Hans Zimmer”) to demonstrate API.AI annotation. The first query was annotated correctly. With the second, we had to go back in and annotate by hand:

Creating the “search” intent — when queries are not automatically annotated, you must do it yourself by hand.

It took about 20 inputs for API.AI to start accurately understanding all of the concepts in the search queries.

API.AI turns these recognized concepts into JSON which, if you have the webhook option enabled, is submitted to your backend for processing. The only steps left were to translate this JSON into Graql queries, execute those queries on the movie knowledge graph and return formatted results to API.AI.

Building Graql Queries

The final stage to this is translating the API.AI output into a Graql query. If you are not familiar with the Graql query language, it is built on top of the Grakn knowledge graph and allows you to express relationships over large quantities of data concisely and intuitively. This means that even without knowing all of the information in a sentence, you can still extract relevant search results.

Back to the simple search “horror movies.” Our API.AI search intent gives you the following JSON for that query:

{
    "entity":"movie",
    "search":[{
        "resource":"Horror"
    }]
}

It’s a very simple translation into Graql. We are looking for movies that are related to something that is related to a resource with the value “horror.” The middleman, in this case, is the genre instance. We know nothing about it except that it connects a movie to a “Horror” resource and we can thus just represent it as a variable:

match $movie isa movie; 
      $resource value "Horror"; 
      ($genre, $resource);
      ($movie, $genre);

A more complicated query might be “find me movies directed by Hayao Miyazaki with planes.” Here we are not only looking for movies about planes but also ones that have been directed by something with the name “Hayao Miyazaki.”

The movie ontology specifies “director” as a role-type. Relations connect instances together and instances are associated with relations using roles. This is necessary to describe how the entity acts in a relation. For example, a “person” can play the role of “producer” or “composer” in a “has-crew” relation. But a person could play the role of “filming-location.”

Using the composite entity describe above, API.AI will provide us with an object containing the related role and the resource value describing the instance that plays the role. API.AI would output the following JSON for this query:

{
   "parameters": {
      "entity": "movie",
      "search": [
        {
          "role": "director",
          "resource": "Hayao Miyazaki"
        },
        {
          "resource": "Planes"
        }
      ]
    }
}

This can also simply be translated into Graql. The only difference in Graql between the two examples is that, in this case, the query specifies the role-type “director” when querying for the relationship between movie and middleman instance:

match $movie isa movie;
      // directed by Hayao Miyazaki
      $resource-name value "Hayao Miyazaki";
      ($resource-name, $person);
      // specify that you want a director
      (director: $person, $movie);
      // about planes
      $resource-keyword value "planes";
      ($resource-keyword, $keyword);
      ($keyword, $movie);

And that’s it! Hook all of the components up and you can search your graph using natural language. Here it is in Slack:

Just to clarify: As explained above, a chatbot is made up of many more components than just search. We’ve started with semantic search because in order to program any of the other features, the bot will need to be able to communicate with the graph. So, while this is not the entirety of the chatbot, search is the first — and arguably most important — component.

If you’re interested in seeing more details about how the backend is coded, take a look on my GitHub page.

Asking for Information

With that done, we can search for movies to watch. But I wouldn’t want to dedicate two hours of my life to a film without first getting a sense of the film with a description or trailer.

We will create another intent on API.AI called information. Rather than finding you movies based on types in the ontology, this intent will find your information related to a provided instance.

A lot of the ground work for creating this intent was already done — we can reuse the API.AI entities that we created for the previous intent.

We create an API.AI entity that will match any entity, resource or role type in the graph.

We can now create the information intent. This intent is configured to return one “resource-value” entity and one “information” entity.

API.AI still has some trouble distinguishing between the search and the information intents. To be fair to them, there is a bit of overlap between the two intents. The query “keywords of Titanic” can be a search, you want to find the type keyword described by to an instance Titanic, or an information, you want to find all the keywords related to a specific instance, Titanic.

That said, the information intent allows us to ask for any information related to any instance.

What Do You Want to Watch?

API.AI has done an amazing job at creating integrations with commonly used software. It was a simple click of a button to integrate the webhook to our backend, and another few clicks to get it working in a Slack channel. Feel free to join our community Slack channel to try it out and use the #moviebot channel to ask questions that start with @moviebot2.

Looking Towards the Future

There are a couple of reasons GRAKN.AI is ideal for chatbots. The platform provides implicit context disambiguation. If you try to “find movies filmed in Mississippi,” Grakn knows to search for movies from cities, towns, and regions inside the state of Mississippi and to ignore anything related to the river. The flexible ontology allows you to easily define your domain and update it in the future.

This is a lot of information to put in a single blog post, so we have planned a series for the future. In these future posts, we will be both adding new features and explaining these benefits of using GRAKN.AI in more detail. Some possible posts include:

  1. Context management: Carrying over results from the previous questions.
  2. Degrees of separation: Teaching the bot to understand nested queries like “movies starring the spouse of Brad Pitt.”
  3. Ranking: Sorting results based on an implied resource.
  4. Alexa: It would be awesome to talk to the bot.
  5. Data improvements: The movie graph only contains about 1000 films and 20,000 people and the data is quite messy behind the scenes. To make the chatbot more robust, we must make improvements to the data.

Once these are all done, we can hopefully have some interesting conversations — but please let us know what you think. Are you working on a chatbot project? What sorts of challenges are you encountering? What topics would like us to cover? Leave us a comment below, on our discussion boards or in our Slack channel. We look forward to hearing from you!

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
ai ,chatbots ,grakn.ai ,graql ,big data ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}