Data Engineering for Risk Analytics

DZone 's Guide to

Data Engineering for Risk Analytics

Risk Data Object, a new model for risk analytics.

· Big Data Zone ·
Free Resource

What is Risk Analytics?

Risk is everywhere. The kind that can hit people, assets, reputations and more but how does one understand risk?

These shots below are from floods in Thailand in 2011 and New York in 2012. Western Digital, at the time, supplied a critical proportion of HW components and, on the right, Hurricane Sandy impacted many businesses and individuals in New York and the Northeastern US. Providing protection for these events requires great risk analytics. Understanding this risk is critical to creating protection against it. Protection does not just come from better building-codes but also from financial recovery. That financial protection is the job for the financial services and insurance industries. Bettering our understanding of and practices in risk analytics as a field is one of the most interesting problems in big data these days, given the increasing set of risks we have to watch for. 2011 Thailand (left) 2012 New York (right) floods

How Does Risk Analytics Work?

There are many types of risk — risk to people, assets. However, when it comes to impact, few natural and man-made disasters are outdone by many other categories in terms of damages. 

The initial steps of risk analytics start with understanding exposure — the risks a given asset, individual, etc are exposed to. Understanding exposure means detailing events that lead to these damages and losses that could result from those events. Formulas get more complicated from here. There is a busy highway of data surrounding this field. Data engineers, data scientists, and others involved in risk analytics work to predict, model, select, and price risk to calculate how to provide effective protection.

Data Engineering for Risk Analytics

Let’s take a look at property-focused risks. Risk analytics, as I said, starts with an understanding of the exposure of the property to risk. What’s at risk can be a commercial or a residential building. What kind of events could pose risk and what losses could result from those events depends on many variables? 

In today's enterprise, if you want to work with exposure data, you have to work with multiple siloed systems that have their own data formats and representations. These systems don’t speak the same language. For a user to get a complete picture, they need to go across these systems and constantly translate and transform data between them. As a data engineer, how do you provide a unified view of data across all systems? How can you enable a risk analyst to understand all kinds of perils from a hurricane to hail to a storm surge and roll this all up so you can guarantee the coverage on these losses?

There are a number of standards that the industry uses to integrate, transfer, and exchange this type of information. The most popular of these formats is called an EDM — Exposure Data Model. However, EDMs and some of their less popular counterparts (CEDE — Catastrophe Exposure Database Exchange and OED – Open Exposure Data) have not aged well and have not kept up with the industry needs. 

  • These older standards are property-centric; risk analytics requires understanding new risks, such as cyberattacks, liability risks, and supply chain risk.

  • These older standards are propriety — they are designed for a single system that does not take into account the needs of various systems that need new verbs in their vocabulary. For example, they can’t support new predictive risk models.

  • These standards don’t come with the right containment to represent high fidelity data portability — the exposure data formats usually don’t represent losses, reference data, and settings used to produce the loss information that can allow for data integrity.

  • These standards don’t have extensibility. Versioning and dependencies on specific product formats (like database formats specific to version X of SQL Server etc) constantly make data portability harder.

This creates a huge data engineering challenge. If you can’t exchange information with high fidelity, forget getting reliable insights. As anyone dealing with data will say: garbage in, garbage out!

For any data engineer dealing with risk analytics, there is great news. There is a new open standard that is designed to remove shortcomings of EDM and other similar formats. This new standard has been in the works for several years. It is RDO, the Risk Data Object. RDO is designed to simplify data engineering. It is designed to simplify integrating data between systems that deal with exposure and loss data. It isn’t just RMS working to invent and validate this standard in isolation. A steering committee of thought leaders from influential companies is working on validating the RDO open standard.

RDO will allow us to work on risk analytics much more effectively. This is the way we can better understand the type of protection we need to create against climate change and other natural or man-made disasters. You can find details on the RDO here. If you are interested in RDO, have feedback, or would like to help us define this standard, you can contact the RDO steering committee at rdo@rms.com.

analytics, big data, cede, data standardization, edm, ode, rdo, risk analytics

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}