The interest in machine learning and the associated appetite to drive business outcomes from such investments continues to build. I’ve been talking to many insurance organisations over the past 18 months around machine learning and four consistent areas tend to arise as organisations grapple with the application and value of machine learning.
As 2017 gets well underway, I thought it prudent to share and gather opinion experiences in the insurance industry and I’ve also summarized these points of view as part of Louise Matthews’ ‘Five Minutes with….’ video series.
First and foremost, machine learning WILL change the way insurers do business. The insurance industry is founded on forecasting future events and estimating the value/impact of those events and has used established predictive modeling practices – especially in claims loss prediction and pricing – for some time now. With big data and new data sources such as sensors/telematics, external data sources (Data.gov), digital (interactions), social and Web (sentiment), the opportunity to apply machine learning techniques has never been greater across new areas of insurance operations.
Machine Learning has now become an essential tool for insurers and it is used extensively across the core value chain to understand risk, claims and customer experience. Specifically, it is enabling insurance companies to yield higher predictive accuracy, as it can fit more flexible/complex models. As opposed to traditional statistical methods, machine learning takes advantage of the power of data analytics and is capable of computing seemingly unrelated datasets whether structured, semi-structured or unstructured.
By way of an example, predictive models based upon machine learning now take into consideration:
- Structured data: type of loss, amount of loss, physician ID, etc.
- Text: Notes, diaries, medical bills, accident reports, depositions, social data, invoices, etc.
- Spatial, graph: accident location, work location, relationship of parties (physician, claimant, repair facilities), etc.
- Time series: sequence of events/actions, claim date, accident date, duration between events/action, etc.
Now more than ever, insurers have the ability to evaluate mass amounts of underwriting/claims notes and diary (unstructured data), in addition to more standard documentation.
Pricing risk, estimating losses and monitoring fraud are critical areas that machine learning can support. Insurers have introduced machine learning algorithms primarily to handle risk similarity analytics, risk appetite and premium leakage. However, it is also widely used to aid the frequency/severity of claims, manage expenses, subrogation (general insurance), litigation and fraud.
One of the most impactful machine learning use cases is the ability to learn from audits of closed claims, as for the very first time leakage becomes controllable by the insurer. Claim audits are traditionally a manual process by nature, however, machine learning techniques provide an up-lift in the ability to learn from those by applying enhanced scoring and process methods throughout the claims lifecycle.
Those claim handling algorithms can be also used to monitoring and detecting fraud; however, one of the limiting factors may be the number of claims fraud cases/instances an insurance company has as the fraud datasets are fundamental for both traditional and machine learning models.
I’m often asked if machine learning can deliver a tangible decline in fraud rates and I do believe it can have an impact on earlier identification, or ‘counter-fraud’ techniques. The key element is to reduce the false positives and to apply machine learning algorithms to help determine which claims are potentially fraudulent vs. those that are legitimate.
Insurance companies applying this technique are reducing fraud in two aspects: earlier identification of the fraud and allocating resource time on the claim fraud investigation vs. spending on valid claims. This also increase customer satisfaction as valid claims are paid faster.
Nothing evidences the impact of any technology more than how it is applied in the real world and we are seeing those as relates to insurance fraud. Using machine learning, insurers can load claims data (whether structured, unstructured and semi-structured data) into a huge repository, often called “data lake”. This method differs from traditional predictive models which only leverage structured data. Claims notes, diaries and documents are key in discovering fraud and developing fraud models. In case of fraud detection, the procedure would consist on:
- Learning Phase: where you are learning from “training data” or claims which are fraudulent and those which are valid. it consists on pre-processing (normalization, dimension reduction, image processing if you are using photos, aerial images etc), learning (supervised, unsupervised, minimization, etc.) and error analysis (precision, recall, overfitting, test/cross validation, etc.).
- Prediction Phase: here one uses the model from the learning phase and apply it to new data and is deployed for detecting and flagging fraud.
- Continuous Learning Phase: it is key to continuously recalibrate your models with new data and behaviors.
In addition to machine learning, the usage of Graph Analytics is also rapidly becoming popular because of its ability to visualise fraud patterns.
The usage of Graph Analytics with Apache Spark/GraphX is a newer method being leveraged as it enables the usage of neural network and social networks which is key in claims fraud analysis. This method is becoming quite popular vs. traditional claims scoring or business rules as these methods (considered a “flagging model”) and may result in too many false positives.
A Graph Analytics technique can help you understand the data relationships and is also used for investigating individual claims fraud cases. This method allows insurance companies to more quickly visualize fraud patterns vs. traditional scoring models.