Enhancing Neural Network Models for Knowledge Base Completion
Neural networks are among the most widely used machine learning techniques. Due to their universality, you can theoretically approximate any function with a precise enough model, which is pretty crazy if you think about it!
Join the DZone community and get the full member experience.Join For Free
Just as electricity transformed every major industry starting about 100 years ago, AI is now poised to do the same. Several large tech companies have built AI divisions and have started transforming themselves with AI. But in the next few years, companies of all sizes and across all industries will realize that they too must be part of this AI-powered future.
So claims Andrew Ng, the polymath who has brought deep learning to the masses. Ng has served as Baidu's chief scientist, head of Google Brain, a professor at Stanford, and founder of Coursera, where he now publicly distributes coursework that helps people worldwide learn to understand AI. Electricity was harnessed by a niche few, but once people began to realize its enormous potential, it very quickly became indispensable in all walks of life.
GRAKN.AI is the database for AI, and so we, too, have been working hard to integrate our product with the world of deep learning.
Artificial neural networks — the ones that run on anything from our MacBook Pros to the most powerful GPU clusters around — are among the most widely applicable learning techniques out there. And due to the universality of neural nets, you can theoretically approximate any function with a precise enough model, which is pretty crazy if you think about it.
Neural networks are trained, like any machine learning algorithm, with a set of training data, and are evaluated on a separate set of test data. Training data creates, or "trains," the weights and biases of the network, which are mathematical formulations of the inputs and outputs of the network's neurons. We start off with a randomly generated vector of weights and biases, and with enough data and time, we arrive at a set of parameters that can accurately predict complex inputs.
To get a better sense of how we can utilize Grakn in tandem with neural nets, we should first briefly review some of Grakn's design patterns as well as explain the project that forms the basis of this blog post.
Graphs in Grakn are often referred to as "knowledge bases," which you can read more about here. A knowledge base is a graphical representation of known information; relationships, hierarchies, etc. These can be queried to see whether an edge exists between two given nodes either explicitly in the graph itself or implicitly through schemas. Expanding a graph expands the knowledge base it is associated with, and thus the ground truth. One particular type of relational system that Grakn handles particularly well is a hierarchy, in which relationships are unidirectional and indicate some underlying vertical ordering of concepts.
In the world of artificial intelligence, learning improvements are the holy grail. If you can make a model learn more accurately, more quickly, or more precisely, you can open up the domain you're working in to a whole new set of use cases. When you are working with knowledge bases, you often have incomplete information; this can produce flawed insights. Thus, if you can improve the quality of a knowledge base, you will improve any of the machine learning that is built on top of it. Knowledge base completion is the concept of expanding existing knowledge bases with new or improved information through some established rules of inference (and by adding more data, of course).
Using deep learning to facilitate knowledge base completion is, then, a natural approach. With the existing knowledge base as a training set, you can program the neural net as a binary classifier to find likely relationships and then insert them back into the graph. It turns out that Grakn can do all of the legwork as a knowledge base!
This paper presents a neural network for knowledge base completion. It is built to accommodate two separate databases: WordNet and Freebase. For the purposes of this project, we will only concern ourselves with WordNet.
WordNet is a text corpus, produced by Princeton University, is a lexical database of English and contains entities (words with unique identifiers) as well as relations (ways of describing the lexical relationship between two related entities). The goal of the neural network is to be able to predict whether a given entity e1 is related to an entity e2 by way of relation r. For instance, if:
e_1 = __atlantic_1e_2 = __north_sea_1r = _has_part
...then we are asking whether the Atlantic > has part > North Sea, which is true. Conversely, we could ask regression curve > part of > shell bean — clearly false.
There are 11 relations in the dataset and nearly 40,000 different entities. The neural tensor network is like most neural nets in that it trains a set of weights and a bias; one major distinction, however, is an extra tensor layer that multiplicatively relates entities and relations.
If the math behind this isn't clear, don't worry. All we need to know is that the neural tensor network is able to handle multiple types of entity pairs per instantiation of entity-relation-entity (up to k slices, as you can see in the equation). You can refer to the paper I linked above for more specifics.
Once the network has been trained, we can do two things with the results:
- Classify an input e1-r-e2 triplet as correct or incorrect based on tuned thresholds.
- Determine the likeliest second entity for a given e1-r (first entity - relation) combination.
Of course, the classification of #1 only requires one pass-through of the neural net, whereas #2 requires as many passes as there are entities since we are calculating a likelihood for each one.
This presents a big challenge: Verifying a triplet is simple enough, but improving a knowledge base by adding the most relevant relationships is a lot more computationally expensive. I will explain below how I dealt with that.
This project was implemented in two parts. The first was the actual neural tensor network, as explained above, which produces the weights for each embedding and provides the initial set of predictions for the test data. The second part was the Graknbase, which stores all the entity-entity relationships as well as the rules of inference, and which can be used to check for a graph connection between two entities.
The neural network is a pure NumPy/SciPy implementation of the knowledge base completion paper. The base of the neural network code, not written by me, can be found here. Because this implementation does not make use of a deep learning library like TensorFlow and is set up to run on CPU, not GPU, the time it takes to train the neural network is comparatively slow. I have optimized it in my implementation but it still takes several hours to complete; luckily, the neural net only needs to be run once before we "plug it in" to Grakn. Of course, in a production environment, you could make use of TensorFlow to improve runtime.
The Grakn side of things is also fairly involved. The flow goes something like this:
- Build the ontology and rule set and insert the relation data from the training set into Grakn.
- After the neural net has been run (and an accuracy has been calculated), loop through the test set again, this time checking Grakn for relationships; use any inferred relationships to come up with a modified accuracy.
- Return to the output of the neural net, choose a subset of x entity-relation-entity triplets that the neural network gives a high score to, and insert these triplets into Grakn. Note, of course, that some of these triplets might be false!
- Repeat Steps 2 and 3 a set number of times, each time calculating an updated accuracy.
Our goals are two-fold:
- Maintain Grakn as a versatile and robust knowledge base even as additional (possibly false) relationships are added to it.
- See if the accuracy of the neural net classifier is improved with Grakn inferences!
What exactly is happening here? In Step 1 of the flow, by building a schema and a set of inference rules specific to the project, we can augment the power of the neural network by letting Grakn check for relationships that the network might not have caught. Having a good number of reliable inference rules is critical to making the most of Grakn; I explain in the results section what happens when a priori inferences are few and far between.
To improve accuracy, we have to improve either our Type I or our Type II error rate. In an incomplete knowledge base, it is difficult to check for false positives (Type I) — a graph claiming that a relationship does not exist when the neural network claims otherwise might just be down to a lack of information in the knowledge base itself. Therefore, in Step 2, we look for Type II errors — false negatives — by looping through every item in the test set that the neural network classified as false, and trying to find a relationship that would reject that classification.
The augmenting happens in Step 3, which is where we return to the obstacle I mentioned in the paragraphs above the Implementation header. We want to add the most likely triplets in every iteration of the algorithm without explicitly checking every e1-r-e2 combination (this would be 40 000 * 40 000 * 11 checks). Instead, we can choose x entities at random for each e1 entity, calculate triplet scores across each of the 11 relations for these x entities, and choose the highest one.
As long as x is large enough that we can be fairly sure that each addition we make has a high likelihood of being true, we can make additions to the graph in a reasonable amount of time without sacrificing too much accuracy. This is an important tradeoff to consider, though; if we make additions to the knowledge base that we are not completely positive are true, we risk contaminating the knowledge base with incorrect information. One question we will want to answer is how the ratio of correct/incorrect Grakn inferences changes over time.
As we've seen above, Grakn is a natural structure for modeling hierarchical knowledge bases. Training data can be fed into the database to create relationships between entities. The graph can be expanded by re-inserting likely entities, as determined by the network of tensors, into the Grakn database. Moreover, Grakn gives a user the ability to scope out potential Type II errors in test data that has been fed through a neural net. It is even possible for Grakn to improve the program's prediction accuracy in such situations.
So this is one application of deep learning principles to Grakn: hierarchical relationship matching. Train a neural net, and then use Grakn inferences on the test data to iteratively improve the accuracy of your predictions. But there are of course other ways that Grakn can improve the efficiency and operation of neural networks.
For instance, in n-ary classification or regression settings, you could use Grakn analytics to intelligently initialize the neural network's vector spaces rather than simply using random initializations. Grakn relationships and inferences could potentially give you advance information about ground truths that could make your network learn more quickly.
Another idea, which would be more difficult, is to use the output of the neural net to build inference rules for your graph. Using only neural pathways with a very high confidence, you could add not only entities and relationships to a Grakn graph but even the rules themselves! This would be especially useful in situations where explicit inference rules are scarce, such as in the project you've just read about.
The potential for applying Grakn to deep learning principles is limitless, but of course, you have to be smart about what you do and you have to understand the purpose of the technologies you are using. I hope this post gave you some inspiration. Don't hesitate to reach out with questions!
Published at DZone with permission of Nicolas Powell, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.