Knowledge Graph Insights Give Investors the Edge
Refinitiv Labs has developed a global infrastructure database which uses knowledge graph insights derived from large volumes of mostly unstructured data.
Join the DZone community and get the full member experience.Join For Free
- Global Infrastructure API, the latest proof of concept by Refinitiv Labs, is a global infrastructure database, which links fundamental data and provides an API entry point for queries.
- The prototype leverages knowledge graph insights to interlink different Refinitiv datasets, including bonds, syndicated loans, project finance, Middle East and North Africa (MENA) infrastructure projects, and Belt and Road Initiative (BRI) data.
- Visit the Refinitiv Labs project portfolio to find out how developers, data scientists, and subject-matter experts collaborated to build this customer-focused proof of concept - and many more.
As the ripple effects of the coronavirus are felt across the global economy, affecting manufacturing, supply chains, and the movement of people and goods, capital projects, infrastructure owners, and investors are faced with significant challenges.
These challenges are likely to increase in the months ahead, with infrastructure investment expected to become a key tool for macroeconomic stabilization.
Because of their interconnected and global nature, the pandemic has disrupted the construction industry's supply chains and workforce, which in turn have impacted the cost and schedule of infrastructure projects.
There are additional risks associated with the intense competition for viable assets, while, simultaneously, many consider infrastructure asset valuation to have peaked because of intense competition.
The infrastructure investor community needs to assess these risks to mitigate understandable uncertainty and to promote a more responsible and sustainable recovery in these difficult times.
They look to Refinitiv data for timely, accurate market intelligence, and examine every dimension before striking a deal, including, for example, a company's results with previous projects, and relationships between the parties involved.
"Our clients need to identify the companies with strong operating track records in order for them to show investors that a particular investment is suitable," said Swarnima Sircar, Data Scientist and Associate at Refinitiv Labs.
Risk Assessment Using Linked Data
The Refinitiv Labs data scientists use a range of datasets in their workflow. They evaluate, rank and compare companies and projects, and establish the level of risk involved in different sectors and geographies by seeking particular vulnerabilities.
With a significant volume of data, much of it unstructured, identifying indirect connections between entities can be inefficient, complex, expensive, and sometimes unreliable.
To assist with more effective data analysis, Refinitiv Labs' latest prototype, Global Infrastructure API, integrates five valuable Refinitiv datasets: bonds, syndicated loans, project finance, Middle East and North Africa (MENA) infrastructure projects, and the Belt and Road Initiative (BRI).
The result is a global infrastructure database, which links fundamental data and provides an API entry point for queries.
Using Knowledge Graph Insights to Interlink Data Sources
The Refinitiv Labs team worked in a series of short sprints over approximately five months.
Two developers set up the infrastructure, and two data scientists, experienced in natural language processing (NLP) and knowledge graph insights, worked on creating the pipeline and flexible data schema for transforming and ingesting relevant data sources into graphs.
Their goal was to unify and interlink the data sources automatically, and in the process, account for graph bitemporality, as it is an important requirement in databases used in financial services.
The team received continuous feedback from stakeholders, from selected customers to colleagues within Refinitiv with in-depth knowledge about project finance, global infrastructure, BRI or the MENA region.
Ben Chu, Senior Data Scientist at Refinitiv Labs. explained: "At the beginning, we were apprehensive about using graphs and unsure how far we should explore because of their complexity. This was not due to the technology familiarity or how we wanted to implement this on our end, but we had concerns as graphs can be hard to interpret, even with the visualization built-in.
"We concluded in our comparisons of the linking exercise that a graph approach is natural in linking disparate datasets (compared with the rigid way of joining tables), and this has resulted in much better coverage, and makes the effort well worth it.
"Therefore, we are also careful not to make the graph the end measure, but instead to use it only as a means to provide the intermediate layer for linking. We have put in place extra efforts to build an API with an indexed search as the end layer where users can query easily."
"We first tried to link the datasets without graphs using existing fields explicitly, but found that they didn't map neatly or intelligently and that updating the database in the future would need us to rewrite the entire data model," added Sircar.
Tackling Large Volumes of Unstructured Data
One of the biggest challenges the team encountered was the scale of the task, which comprised a sizeable number of tables. To parse through them and prioritize the linkage required close collaboration with subject-matter experts who could pinpoint the tables most valuable to end-users.
"We found that a lot of valuable unstructured data was already present, either in the form of analyst notes or comments at the time of entry. We wanted to use that information, but needed to parse it systematically, given that we were working with hundreds of tables," explained Sircar.
"We first used named entity recognition to identify entities, then matched them to identifiers like or , and pulled all the associated metadata from the identified entity into our knowledge graph. This dramatically improved (by over 550 percent) the degree to which we were able to surface previously hidden connections across projects and companies."
The team loaded the data as an RDF graph into the Neptune graph database and indexed it with Elasticsearch. Luke Luo, a full stack developer within Refinitiv Labs, then wrapped an API around it and deployed it in Amazon Elastic Compute Cloud (EC2). Today, data across all six datasets can be easily searched and queried with Refinitiv Labs' Global Infrastructure API.
Extensibility was a priority in product design, so clients can take the output from the API and integrate it into their own datasets.
Collaboration in a Time of Crisis
"We were working on this project during a difficult time for everyone because of the global pandemic. We worked separately, but made it a habit to have a daily catch-up to check in on each others' progress," described Chu.
"At Refinitiv Labs, we work in multi-disciplinary teams, and from the start, the project saw collaboration between engineers, data scientists, and content specialists. We used design thinking and put our prototype in front of stakeholders at every stage."
Besides internal subject matter experts, the team also spoke to a range of customers for input, so they could understand the expectations for the API.
"We validated our assumptions with customer insights in a set of short iterations," said Sircar.
Measuring Environmental Risks
Inspired by the initial feedback and results, the team is working towards extending the prototype to add more sophisticated graph analytics that returns a quantitative score on risk measurements.
Environmental risks have emerged as an additional area of key interest. The Lab team is considering using satellite imagery to track vegetation and classify land use around infrastructure projects.Written in collaboration with Refinitiv Labs, Singapore.
Published at DZone with permission of Jo Stichbury, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.