A Graphical View of API Performance Based on Call Location
In this post, we use some Python and data viz to prove that the farther you are from an API call location, the longer and slower the API's response will be.
Join the DZone community and get the full member experience.Join For Free
The performance of APIs is dependent on both the processing time from when the API receives a request and delivers a response and the time it takes for the request and response data packets to traverse the Internet distance between the calling system and the system that hosts the API. The timings for calls to APIs are broken down into components by cURL, "a command line tool and library for transferring data with URLs." In an earlier post, I outlined what curl timings mean.
Your customers want to see a response to their action as soon as possible. In this post, I utilize the API Science API, curl, and a few simple scripts to graphically illustrate the effect of global calling location on overall API performance.
Assume that one component of your app is a call to the World Bank Countries API. The data center for this API is located in Washington, DC, USA. To test the effect of calling location when this API is accessed, I created four API monitors that call the World Bank API from these locations:
- Washington, DC, USA
- Oregon, USA
- Tokyo, Japan
Next, I created four Linux shell scripts that download performance data for the past week from the API Science Performance Report API. Here's
DC_weekly_perf.csh, the script that downloads the past week data for the monitor that calls the World Bank Countries API from Washington, DC:
curl 'https://api.apiscience.com/v1/monitors/1572020/performance.json?preset=lastWeek&resolution=hour' -H 'Authorization: Bearer MY_AUTH_CODE'
The downloaded JSON data for each monitor is stored in a text file (for example,
DC_perf.json). The call to the API Science API returns performance data for the past week binned by hour, with each JSON file containing 168 data entries. A Python script (listed below) performs the processing once the JSON files have been retrieved.
Our objective is to create a graphical view of the performance timings for calling the World Bank Countries API from the four different locations. So, from each JSON file, we must extract the
averageTotal value for each hour. We want to plot this data for each calling location on a single graph, so we can easily compare the performance of the World Bank API based on calling location.
Here is the Python script:
# gen_loc_report - 15 February 2019 # generate a report based on JSON data showing # performance with respect to API call location import sys import numpy as np import matplotlib # force matplotlib not to use an Xwindows backend matplotlib.use('Agg') import matplotlib.pyplot as plt import json # get the results from each call location; with open('DC_perf.json') as f: DC_perf = json.load(f) with open('OR_perf.json') as f: OR_perf = json.load(f) with open('IR_perf.json') as f: IR_perf = json.load(f) with open('JP_perf.json') as f: JP_perf = json.load(f) print 'number of results:', \ DC_perf['meta']['numberOfResults'], \ OR_perf['meta']['numberOfResults'], \ IR_perf['meta']['numberOfResults'], \ JP_perf['meta']['numberOfResults'] # for simplicity, assume number of results is # identical across all the JSON files n_results = DC_perf['meta']['numberOfResults'] hourly_perf_total = np.zeros(n_results * 4, dtype=float) hourly_perf_total.shape = (4, n_results) # extract the total performance data for each location for i in range(n_results): hourly_perf_total[i] = DC_perf['data'][i]['averageTotal'] hourly_perf_total[i] = OR_perf['data'][i]['averageTotal'] hourly_perf_total[i] = IR_perf['data'][i]['averageTotal'] hourly_perf_total[i] = JP_perf['data'][i]['averageTotal'] # plot the total performance data for each location plt.plot(hourly_perf_total, label='Wash DC') plt.plot(hourly_perf_total, label='Oregon') plt.plot(hourly_perf_total, label='Ireland') plt.plot(hourly_perf_total, label='Tokyo') plt.xticks(np.arange(0, n_results + 1, 24.0)) plt.ylabel('Average Total Milliseconds') plt.xlabel('Hours Since ' + DC_perf['meta']['endPeriod']) title = 'World Bank Countries API Past Week Performance' plt.title(title) plt.legend(loc='best') # log y axis plt.semilogy() plt.grid(True) #plt.show() plt.savefig('/home/kevin/APIScience/custom_reports/World_Bank_past_week.png')
And here is the resultant graph:
The milliseconds scale (Y-axis) is logarithmic. This plot provides a clear view of the effect of "Internet distance" on the performance of calls to the World Bank Countries API. The API is served from Washington, DC, USA. Calls to the API from Washington DC are generally met in under 100 ms. Meanwhile, calls from Ireland always take significantly longer; while calls from Oregon and Tokyo take longer still.
There is more to investigate. For example, what is the primary cause of these fairly consistent timing differences? The API Science Performance API contains additional timing data, which we'll investigate in a future post.
Published at DZone with permission of Kevin Farnham, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.