Over a million developers have joined DZone.

Creating Scoring Engines With PFA

DZone's Guide to

Creating Scoring Engines With PFA

These examples illustrate the ease with which it’s possible to get started using Portable Format for Analytics (PFA) for creating model scoring engines.

· Big Data Zone ·
Free Resource

How to Simplify Apache Kafka. Get eBook.

Portable Format for Analytics (PFA) provides a standardized way of representing analytical models, providing much-needed model portability, i.e. the ability to train a model on one data platform, serialize the model as PFA, and then score the model across any other data platform supporting the PFA standard. Alpine has developed PFA scoring engines for deployment across a wide variety of environments, including two standalone engines:

  1. RESTful engine
  2. Kafka engine

And two integrated engines:

  1. Storm engine
  2. Spark Streaming engine

I thought it would be interesting to highlight how easy it is with PFA to write your own basic scoring engines, thanks to the availability of open-source PFA implementations. They have been developed by the Open Data Group and are made available under the Apache license.

Getting Started With PFA

Probably the simplest way to create a PFA scoring engine is to use Python and Titus, the open-source Python PFA engine. Using Titus, a PFA engine can be created in just a couple of lines, as shown below.

import json
import sys
from titus.genpy import PFAEngine
# Leverage the PFA doc specified on the command-line
pfa_model = sys.argv[1]
engine, = PFAEngine.fromJson(json.load(open(pfa_model)))

...where the location of the PFA document (in this case assumed to be a JSON-formatted PFA file) is supplied as a command line argument. Once the engine is created, it’s then just a few more lines of code to use the engine to start scoring.

# Invoke any initialization functions
# Score example input
input = {“Sepal_length” : “1.0”, “Sepal_width” : “1.0”, “Petal_length” : “1.0”, “Petal_width” : “1.0”}
results = engine.action(input)
print results

In this example, the resulting output is:

[ec2-user@ip-10-0-0-169 blogs]$ python demo_pfa.py example.pfa

{u’INFO’: {u’Iris-virginica’: 2.2905557894937507e-15, u’Iris-setosa’: 0.9999999999122311, u’Iris-versicolor’: 8.77666387876342e-11}, u’PRED’: u’Iris-setosa’, u’CONF’: 0.9999999999122311}

example.pfa is a logistic regression model trained on the well-known Iris data set, and we are making a prediction based on a single sample of data. As illustrated below, the PFA engine expects the input data to be scored to presented in a dictionary, with the input features labelled as in the PFA documents input specification and the output consisting of a prediction, confidence information, and information on the likeliness that the sample belong to each of the other species of Iris.

Once this basic engine is created, it’s then just a few more lines of Python to create a RESTful scoring engine — or attach the engine to a Kafka stream as discussed below.

RESTful PFA Scoring Engine

This basic code can then be easily used to create a RESTful scoring interface as follows:

#(r”/demo/score/([a-zA-Z0-9_]+)”, scoreModel),
class scoreModel(tornado.web.RequestHandler):
   #Score model
   def post(self, id):
       engine, = PFAEngine.fromJson(json.load(open(‘models/%s.pfa’ % (id))))
       dd = tornado.escape.json_decode(self.request.body)

In the above excerpt, I’m using the Tornado web server, and the score endpoint uses Titus to score the data in the incoming payload. In this example, the model to be used is specified in the URL (it is assumed that the model is already uploaded) and is loaded into an engine, the JSON payload of the incoming message is decoded and passed to the engine for scoring, and the result generated by the PFA engine is returned as the response. (In this example, a new PFA engine is created for each incoming message, which, while inefficient, is okay for illustrative purposes.)

Streaming Scoring Engine

Similar to the RESTful scoring engine, the PFA engine for scoring Kafka streams can be created in a few lines of Python:

from kafka import KafkaConsumer
from kafka import KafkaProducer
pfa_model = sys.argv[1]
engine, = PFAEngine.fromJson(json.load(open(pfa_model)))
kafka_topic_to_score = sys.argv[2]
kafka_topic_to_emit = sys.argv[3]
consumer = KafkaConsumer(kafka_topic_to_score, bootstrap_servers=server_address)
producer = KafkaProducer(bootstrap_servers=server_address)
#Consume messages
for msg in consumer:
   #Score next message
   score = engine.action(json.loads(msg.value))
   #Emit prediction
   producer.send(kafka_topic_to_emit, str(score))

In the above code, the PFA engine is created in an identical fashion to the previous examples, and the kafka-python library is used to create a Kafka client, connect to the specified Kafka broker, and create a message consumer and a message producer. Incoming messages are read by the consumer, scored using the PFA engine, and written by the producer back to Kafka.

PySpark Scoring Engine

Finally, it’s simple to leverage the Titus PFA engine from PySpark, and a simple Spark mapPartitions integration is illustrated below.

In this example, the scoreModel function iterates row by row over the data in the partition, scores each row, and results in a list of predictions.

While it’s obvious that the above examples need significant hardening and error handling before they can be used, hopefully, they illustrate the ease with which it’s possible to get started using PFA for model scoring!

12 Best Practices for Modern Data Ingestion. Download White Paper.

big data ,scoring engines ,pfa ,data analytics ,data platforms ,portability ,iris ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}