Polyglot COVID-19 Dashboard
Covid-19 data visualization with Ruby, Python, Rails, Chartkick, and PostgreSQL.
Join the DZone community and get the full member experience.Join For Free
As a developer, you will be seeing numerous articles on Big Data, containers, complex algorithms, caching, etc. But the reality is that a lot of us still have to solve simple problems, especially if one is a freelance programmer or working with small companies.
A simple use case is that of data, coming in spreadsheets or CSV files, have to be visualized in a simple dashboard. You and the customer agree to build it as a web application. There are plentiful ways, from PHP to the Java-based Metabase, of implementing the solution. Since I have experience with Ruby on Rails (RoR or just Rails) and it has extensive easy-to-use libraries, it's my first go-to choice to build a web application real quick.
I used two data sources. The country-level data are from the GitHub COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at John Hopkins University. For my country, India, the state-level data are from covid19india.org.
There are two main differences in the input data. The global data from CSSE gives the daily cumulative data up until that date in a CSV file. The covid19india data comes as daily incremental data in JSON. The data format I wanted was: place, confirmed, deaths, and recovered.
For the global data, place is the country name, and for India, it is the state name. In order to process the input data, I wrote two scripts in Python and three in Ruby; their functionality is described below.
gdc.py: This Python program takes the CSSE daily CSV file as the input. These files are named date wise, for example, 11-10-2020.csv.
Up until 21st March 2020, these input files had these values in the second, fourth, fifth, and sixth fields. Files from 22nd March have these values in the fourth, eighth, ninth, and tenth fields. Therefore, depending on the file date, the program selects the correct indices to extract values.
Some countries have two words, with a comma in between. These are mapped to a single word for proper splitting of the line with comma as separator. For example: "Gambia, The" in the input is changed to Gambia in the output. The USA data are not aggregated. The same is the case with some other countries. So, as each line is processed, the first time a country is encountered, it creates a key in a dictionary called country_data and inserts the values as an array. Subsequently, if the same country's line comes, the program adds the values in the array corresponding to that country's key.
gdd.py: Its input is the file created from running gdc.py. It opens two files: the input file, which has data for one day, and the data file of the previous day. For each country, it takes the difference between the two days in the counts of confirmed, deaths, recovered, and outputs the difference, which is the global daily delta.
One interesting Python idiom I used is to calculate the difference between two arrays as a one-liner:
[array1 - array2 for array1, array2 in zip(array1, array2)]
idd.rb: This Ruby program calls https://api.covid19india.org/states_daily.json which sends back a JSON object with a key called states_daily. The value is an array of hashes. Each hash element is the state-level data of one category for one state. The category is indicated by the value of the "status" key. Given below is the data of "Confirmed" category for 14-Mar-2020.
idc.rb: Input to this program is the CSV file created by running idd.rb. It opens two files: the input file, which has data for one day, and the cumulative data file of the previous day in the india_daily_cumulative folder. For each state, it adds the values of the two days in the counts of confirmed, deaths, recovered, and outputs the total. That is, it gives the cumulative count up until that date.
The program uses a hash called output_data. If a particular state is appearing for the first time, which is known by its absence in the previous day's file, it creates the key in the hash, else adds the current day values to the previous day values and then adds the key-value pair to the dictionary.
The commands for running the four programs are:
daily_all.rb: This program creates four master data files, collating file content in each directory under datasets. For example, the content of all files in the folder global_daily_cumulative is collated in global_daily_cumulative.csv. The other three files it generates are: global_daily_delta.csv, india_daily_cumulative.csv and india_daily_delta.csv.
For the India datasets, it translates the two-character state/union territory name to the full name. For example, ar is translated to Arunachal Pradesh and wb to West Bengal.
Format of data in the four generated files is:
date, place, confirmed, deaths, recovered.
In PostgreSQL, I created a database called covid19 with four tables: global_data_cumulative, global_data_delta, india_data_cumulative, india_data_delta. The four CSV files generated by daily_all.rb are used to dump the data using the psql utility as follows:
In the front-end layer, I've used Bootstrap for the widgets and Chartkick for the graphs. There is only one screen with four buttons: GDC, GDD, IDC, IDD; all have tooltips to show the full form of their title. There are two date pickers created with flatpickr. In my previous DZone article, I'd described how to build a form in Rails with flatpickr and multiple submit buttons.
To select a country or an Indian state, I have provided drop-downs. Values for these are fetched at startup time with a custom initializer.
As you would have noticed, the initializer also fetches minimum and maximum dates for the countries and Indian states.
All four buttons in the form submit to the display_router action. It first checks for the start date being earlier than the end date. If not, an error string is made and returned to the index page. Otherwise, depending on which button is clicked, it redirects to the appropriate action.
These redirected actions reset the start dates and end dates to the minimum date and maximum date of the respective tables. That is, if the start date is earlier than the minimum date, the start date is set to the minimum date. Similarly, if the end date is later than the maximum date, the end date is set to the maximum date. They then fetch data from the correct table and send them to the views.
Installing Chartkick and making it operational is pretty simple. Add gem "chartkick" to the Gemfile. At the command line in the project folder, run
$ yarn add charkick chart.js
Next, in application.js you require chartkick and chart.js
Chartkick accepts data as hash or array. I used the array format. Data are passed to the views as a two-element array, the first element being the date and the second element the count.
The views first check for an error message and if it is not empty, the error message is displayed. Otherwise, the graph is rendered. Code for India daily cumulative line chart and daily delta column chart is given below:
On the cloud server, the Covid Dashboard is running along with the Bootstrap Flatpickr application described in my previous article. The NGINX sites-enabled configuration to run these two Rails applications on the same server is:
The only difference between my local code and that running on the server is the path to the action in the form. When running on my laptop, I run the two applications separately; the form submits to /display_router as follows:
On the cloud server, the folder root is added to the action path
Opinions expressed by DZone contributors are their own.