Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Relevant Search Leveraging Knowledge Graphs with Neo4j: Part 1

DZone's Guide to

Relevant Search Leveraging Knowledge Graphs with Neo4j: Part 1

Providing accurate, relevant search results is an ever-increasing concern. Here, we take an in-depth look at approaching this feature using knowledge graphs and Neo4j.

Free Resource

Build APIs from SQL and NoSQL or Salesforce data sources in seconds. Read the Creating REST APIs white paper, brought to you in partnership with CA Technologies.

Providing relevant information to the user performing search queries or navigating a site is always a complex task. It requires a huge set of data, a process of progressive improvements, and self-tuning parameters together with infrastructure that can support them.

Such 

search infrastructure  

must be introduced seamlessly and smoothly into the existing platform, with access to all relevant data flows to provide always up-to-date data. Moreover, it should allow for easy addition of new data sources to cater to new requirements, without affecting the entire system or the current relevance.

Information must be stored and managed correctly as well as take into account the relationships between individual items, providing a model and access patterns that can be also processed automatically by artificial minds (machines). These models are generally referred to as Knowledge Graphs. They have become a crucial resource for many tasks in machine learning, data mining, and artificial intelligence applications.

Knowledge Graphs: An Introduction

A knowledge graph is a multi-relational graph composed of entities as nodes and relationships as edges with different types that describe facts in the world.

Out of the many features involved in the processing of data sources to create a knowledge graph, Natural Language Processing (NLP) plays an important role. It assists in reading and understanding text to automatically extract “knowledge” from a large number of data sources.

The search goal varies based on the domain in which it is used, so it would differ substantially amongst web search and product catalog navigation in an eCommerce site, scientific literature discovery, and expert search prominent in medicine, law, and research. All these domains differ in terms of business goals, the definition of relevance, synonyms, ontologies, and so on.

In this post, we introduce knowledge graphs as the core data source on top of which a relevant search application has been built. We describe in detail the data model, the feeding and updating processes, and the entire infrastructure applied to a concrete application. We will consider a product catalog for a generic eCommerce site as the use case for this article; however, the concepts and ideas could be applied easily to other scenarios.

The Use Case: eCommerce

In all eCommerce sites, text search and catalog navigation are not only the entry points for users but they are also the main “salespeople.” Compared with web search engines, this use case has the advantage that the set of “items” to be searched is more controlled and regulated.

However, there are a lot of critical aspects and peculiarities that need to be taken into account while designing the search infrastructure for this specific application:

  • Multiple data sources: Products and related information come from various heterogeneous sources like product providers, information providers, and sellers.
  • Multiple category hierarchies: An eCommerce platform should provide multiple navigation paths to simplify access from several perspectives and shorten the time from desire to purchase. This requires storing and traversing multiple category hierarchies that are subject to change over time based on new business requirements.
  • Marketing strategy: New promotions, offers, and marketing campaigns are created to promote the site or specific products. All of them affect, or should affect, results boosting.
  • User signals and interactions: In order to provide a better and more customized user experience, clicks, purchases, search queries, and other user signals must be captured, processed and used to drive search results.
  • Supplier information: Product suppliers are the most important. They provide information like quantity, availability, delivery options, timing and changes in the product’s details.
  • Business constraints: eCommerce sites have their own business interests, so they must also return search results that generate profit, clear expiring inventory, and satisfy supplier relationships.

All these requirements and data could affect “relevance” in several ways during search, as well as how the product catalog should be navigated. Keeping these constraints in mind, designing a relevant search infrastructure for eCommerce vendors requires an entire ecosystem of data and related data flows together with platforms to manage them.

Relevant Search

Relevant search revolves around four elements: text, user, context, and business goal.


  • Information extraction and NLP are key to providing search results that mostly satisfy the user’s text query in terms of content.
  • User modeling and recommendation engines allow for customizing results according to user preferences and profiles.
  • Context information like location, time, and so on, further refine results based on the needs of the user while performing the query.
  • Business goals drive the entire implementation, as search exists to contribute to the success and profitability of the organization.

In our previous blog posts on 

text search 

and 

NLP for social media recommendations

, we described how to use advanced NLP features and graphs to combine text, user profiles, and behavior to customize the search experience using recommendation engines.

Relevant searches also require context information, previous searches, current business goals, and feedback loops to further customize user experience and increase revenues. These must be stored and processed in ways that can be easily accessed and navigated during searches without affecting the user experience in terms of response time and quality of the search.

Knowledge Graphs: The Model

In order to provide relevant search, the search architecture must be able to handle highly heterogeneous data in terms of sources, schema, volume and speed of generation. This data includes aspects such as textual descriptions and product features, marketing campaigns and business goals. Moreover, these have to be accessed as a single data source, so they must be normalized and stored using a unified schema structure that satisfies all the informational and navigational requirements of a relevant search.

The graph data model, considered as both the storage and the query model, provides the right support for all the components of a relevant search. Graphs are the right representational option for the following reasons:

  1. Information Extraction attempts to make the text’s semantic structure explicit by analyzing its contents and identifying mentions of semantically defined entities and relationships within the text. These relationships can then be recorded in a database to search for a particular relationship or to infer additional information from the explicitly stated facts. Once “basic” data structures like tokens, events, relationships, and references are extracted from the text provided, related information can be extended by introducing new sources of knowledge like ontologies (ConceptNet 5, WordNet, DBpedia, domain-specific ontology) or further processed/extended using services like AlchemyAPI.
  2. Recommendation Engines build models for users and items/products based on dynamic (such as user previous sessions) or static (such as description) data, which represent relationships of interests. Hence, a graph is a very effective structure to store and query these relationships, even allowing them to be merged with other sources of knowledge such as user and item profiles.
  3. Context information is a multi-dimensional representation of a status or an event. The types and number of dimensions can change greatly and a graph allows for the required high degree of flexibility.
  4. A graph can be used to define a ruleengine that could enforce whichever business goal is defined for the search.

Lately, the use of graphs for representing complex knowledge and storing them in an easy-to-query model has become prominent for information management, and the term “knowledge graph” is becoming increasingly popular. Sometimes defined as “

encyclopedias for machines

,” knowledge graphs have become a crucial resource for advanced search, machine learning, and data mining applications

. Nowadays, graph construction is one of the hottest topics in artificial intelligence (AI). 

A knowledge graph, from a data model perspective, is a multi-relational graph composed of entities as nodes and relationships as edges with different types. An instance of an edge is a triple (

e1

,

r

,

e2

) which describes the directed relationship

r

between the two entities

e1

and

e2

.

According to this definition, we designed the following logical schema for this specific use case:

Image title

This schema merges multiple knowledge graphs into one big knowledge graph that can be easily navigated. Many of the relationships in the schema above are explicitly loaded using data sources like prices, product descriptions, and some relationships between products (like IS_USEFUL_FOR), while others are inferred automatically by machine learning tools.

This is the logical model which can be extended to a more generic and versatile design with various types of relationships. Consider this sample as a representation of product attributes:

Image title

The specific relationships that describe a product feature, for instance

HAS_SIZE

,

HAS_COLOUR

, are replaced with more general and dynamic schema:

(p:ProductData)-[:HAS_ATTRIBUTE]->(a:Attribute), 
(a)-[:HAS_KEY]->(k:Key {ref: “Size”}),
(a)-[:HAS_VALUE]->(v:Value {data: “128 GB”})

Part of the model is built using NLP by processing the information available in product details. In this case, GraphAware NLP framework as described in a previous blog post

, is used to extract knowledge from text.

After the first round in which text is processed and organized into tags as described in the schema, information is extended using ConceptNet 5 to add new knowledge like synonyms, specification, generalization, localization, and other interesting relationships. Further processing allows computing of similarities between products, clustering them, and automatically assigning multiple “keywords” to describe the cluster.

The knowledge graph is the heart of the infrastructure not only because it is central to aiding the search but also because it is a living system growing and learning day by day, following the user needs and the evolving business requirements.

This article will be continued soon in Part 2.

References

[1]

D. Turnbull, J. Berryman –

Relevant Search

, Manning


[2]

A. L. Farris, G. S. Ingersoll, and T. S. Morton –

Taming Text

, Manning


[3]

Google Knowledge Graph –

https://www.google.com/intl/es419/insidesearch/features/search/knowledge.html
[4]

L. Del Corro – Knowledge graphs: Encyclopaedias for machines –

https://www.ambiverse.com/knowledge-graphs-encyclopaedias-for-machines/
[5]

E. Gabrilovich, N. Usunier –

Constructing and Mining Web-Scale Knowledge Graphs

, SIGIR ’16 Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval





The Integration Zone is brought to you in partnership with CA Technologies.  Use CA Live API Creator to quickly create complete application backends, with secure APIs and robust application logic, in an easy to use interface.

Topics:
text search ,neo4j ,graph database ,search ,integration ,knowledge graphs

Published at DZone with permission of Alessandro Negro. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}