DZone
AI Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > AI Zone > Deep Learning for NLP

Deep Learning for NLP

Deep Learning is usually associated with neural networks. In this article, we show that generative classifiers are also capable of Deep Learning.

Cohan Carlos user avatar by
Cohan Carlos
·
Jun. 22, 17 · AI Zone · Tutorial
Like (5)
Save
Tweet
5.62K Views

Join the DZone community and get the full member experience.

Join For Free

Deep Learning is a method of Machine Learning involving the use of multiple processing layers to learn non-linear functions or boundaries.

What Are Generative Classifiers?

Generative classifiers use the Bayes rule to invert probabilities of the features F given a class c into a prediction of the class c given the features F. The class predicted by the classifier is the one yielding the highest P(c|F). A commonly used generative classifier is the Naive Bayes classifier. It has two layers (one for the features F and one for the classes C).

Deep Learning Using Generative Classifiers

The first thing you need for deep learning is a hidden layer. So you add one more layer H between the C and F layers to get a Hierarchical Bayesian classifier (HBC).

Now, you can compute P(c|F) in an HBC in two ways:

Product of Sums

Computing P(c|F) using a Product of Sums.

Sum of Products

Computing P(c|F) using a Sum of Products.

The first equation computes P(c|F) using a product of sums (POS). The second equation computes P(c|F)using a sum of products (SOP).

POS Equation

We discovered something very interesting about these two equations.

It turns out that if you use the first equation, the HBC reduces to a Naive Bayes classifier. Such an HBC can only learn linear (or quadratic) decision boundaries.

Consider the discrete XOR-like function shown in Figure 1.

hbc_figure_1

There is no way to separate the black dots from the white dots using one straight line. Such a pattern can only be classified 100% correctly by a non-linear classifier.

If you train a multinomial Naive Bayes classifier on the data in Figure 1, you get the decision boundary seen in Figure 2a.

Note that the dotted area represents the class 1 and the clear area represents the class 0.

Multinomial NB Classifier Decision Boundary

Figure 2a: The decision boundary of a multinomial NB classifier (or a POS HBC).

It can be seen that no matter what the angle of the line is, at least one point of the four will be misclassified. In this instance, it is the point at {5, 1} that is misclassified as 0 (since the clear area represents the class 0).

You get the same result if you use a POS HBC.

SOP Equation

Our research showed us that something amazing happens if you use the second equation.

With the “sum of products” equation, the HBC becomes capable of Deep Learning.

SOP + Multinomial Distribution

The decision boundary learned by a multinomial non-linear HBC (one that computes the posterior using a sum of products of the hidden-node conditional feature probabilities) is shown in Figure 2b.

Decision boundary of a SOP HBC.

Figure 2b: Decision boundary learned by a multinomial SOP HBC.

The boundary consists of two straight lines passing through the origin. They are angled in such a way that they separate the data points into the two required categories.

All four points are classified correctly since the points at {1, 1} and {5, 5} fall in the clear conical region which represents a classification of 0 whereas the other two points fall in the dotted region representing class 1.

Therefore, the multinomial non-linear hierarchical Bayes classifier can learn the non-linear function of Figure 1.

Gaussian Distribution

The decision boundary learned by a Gaussian nonlinear HBC is shown in Figure 2c.

Decision Boundary of a Gaussian SOP HBC.

Figure 2c: Decision boundary learned by a SOP HBC based on the Gaussian probability distribution.

The boundary consists of two quadratic curves separating the data points into the required categories.

Therefore, the Gaussian non-linear HBC can also learn the non-linear function depicted in Figure 1.

Conclusion

Since SOP HBCs are multilayered (with a layer of hidden nodes) and can learn non-linear decision boundaries, they can, therefore, be said to be capable of deep learning.

Applications to NLP

It turns out that the multinomial SOP HBC can outperform a number of linear classifiers at certain tasks. For more information, see here

Deep learning NLP Naive Bayes classifier Decision boundary

Published at DZone with permission of Cohan Carlos. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • A Simple Guide to Heaps, Stacks, References, and Values in JavaScript
  • 3 Predictions About How Technology Businesses Will Change In 10 Years
  • Password Authentication. How to Correctly Do It.
  • Exporting and Importing Projects in Eclipse

Comments

AI Partner Resources

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo