Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Analyzing Tourism Data With Kotlin [Code Snippet]

DZone's Guide to

Analyzing Tourism Data With Kotlin [Code Snippet]

Explore some of Kotlin's functional processing and data analytics capabilities with the example of tourism data from India.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Here's a brief example of some of the functional processing capabilities of Kotlin.

To get some raw data, I headed to http://data.gov.in. The original dataset can be found at here. The modified CSV with the continents added is here.

The CSV has the following columnar structure.

  • Column 1: Country of tourist
  • Column 2: Continent
  • Columns 3 onwards: Number of tourists visiting India 2001-2015

The problem statement I made up is contrived to be able to demonstrate some interesting coding aspects. In any case, the statement is as follows.

Read the file. Group all the data by continents. For each continent, compute the following:

  • The total number of tourists from all countries in that continent in 2015.
  • Percentage growth for the total number of tourists for that continent from 2001 to 2015.
  • Country from which the maximum number of tourists visited India in 2015.

Display the data in the descending order of the percentage growth rate from that continent.

The resultant Kotlin code is as follows:

import java.io.File

data class CountryData(val name: String, val visitors: List<Int>)
data class Result(val tourists2015: Int, val pctGrowth: Int, val maxCountry: String)

fun main(args: Array<String>) {
//  Open file
  File("tourists-to-india.csv")
      // Read all lines
    .readLines(Charsets.US_ASCII)
      // Drop the first line (column headers)
    .drop(1)
      // Drop the last line (file totals)
    .dropLast(1)
      // For each remaining row in the file
    .map { row ->
      // split into cells using a comma as the delimiter
      row.split(",")
          // for the array of cells in each row
          .let { array ->
            // create a pair. The first value is the continent name (array[1])
            // The second value is the CountryData ie.
            //    list of tourists from that country each year starting 2011
            array[1] to CountryData(array[0], array.drop(2).map { it.toInt() })
          }
    }
      // collate all country data for each continent into a list of countrydata
      // for that continent
    .groupBy({it.first}, { it.second })
      // for each continent
    .map { (continent, countriesData) ->
      // compute tourists from across the continent in 2001
      val tourists2001 = countriesData.sumBy { it.visitors[0] }
      // compute tourists from across the continent in 2015
      val tourists2015 = countriesData.sumBy { it.visitors[14] }
      // compute percentage growth
      val pctGrowth = (tourists2015 - tourists2001) * 100 / tourists2001
      // now we want to find out which country in that continent sent
      // the maximum number of tourists
      val maxCountry = countriesData
          // sort data based on 14th element in the list ie. visitors for 2015
          .sortedByDescending { it.visitors[14] }
          // take the first country data and specifically its name attribute
          .first().name
      // construct a pair of continent to result
      continent to Result(tourists2015, pctGrowth, maxCountry)
    }
      // sort the continent result pairs based on the percentage growth
    .sortedByDescending { it.second.pctGrowth }
      // display the results
    .forEach { println(it) }
}

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
data analytics ,kotlin ,big data

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}