{{announcement.body}}
{{announcement.title}}

7 Data Science Project Ideas for Aspiring Data Scientists

DZone 's Guide to

7 Data Science Project Ideas for Aspiring Data Scientists

A beginner-friendly list of data science projects for May 2020.

· Big Data Zone ·
Free Resource


Due to popular demand and many requests, I decided to create a unique list of data science projects for those that are beginning their journey as a Data Scientist. There’s a mix of visualization projects, exploratory data analysis projects, and predictive modeling. I hope you enjoy this article and wish you the best of luck in your endeavors!


Rainfall in India

Project type: Visualization
Link to dataset here.


This dataset contains monthly rainfall details of 36 sub-divisions of India. Here are some visualization ideas you can try for yourself:

  • You can create bar graphs or pie graphs to compare the amount of rainfall by region
  • You can create a line graph to compare rainfall by region over time
  • You can create an animated choropleth map to show where it rains over time! If you want to learn how to build a choropleth visualization, check out my tutorial here.

Global Suicide Rates

Project type: Exploratory Data Analysis
Link to dataset here.


This is a consolidated dataset with details on suicide rates, human development index (HDI) numbers, GDP, and demographics by country by year. The purpose of this dataset was to see if there are any indicators that are correlated to increased suicide rates.

Explore the data and see what countries and continents have the highest suicide rates. What trends do you notice? Are suicide rates increasing or decreasing overall? What is the proportion of the number of suicides between males and females? See if you can find any variables that are correlated with suicide rates.

Summer Olympic Medals

Project type: Exploratory Data Analysis
Link to dataset here.


On a less morbid note, here’s a dataset that contains all of the medal winners in the Summer Olympics from 1976 Montreal to 2008 Beijing. Explore the data and see which countries have won the most medals overall. Are there countries that are performing better over time? What about worse over time?

World Happiness Report

Project type: Exploratory Data Analysis
Link to dataset here.


The happiness score is a quantifiable measurement of the average ‘happiness’ of a country. This is based on six factors: economic production, social support, life expectancy, freedom, absence of corruption, and generosity.

This dataset contains 155 countries and their associated happiness scores and 6 factors from 2015 to 2019. Are we globally becoming more or less happier each year? What continent is the happiest? The least happy? Which of the six factors has the biggest impact on happiness? What about the least impact?

Pollution in the United States

Project type: Visualization
Link to dataset here.


This dataset contains information on the four major pollutants (Nitrogen Dioxide, Sulphur Dioxide, Carbon Monoxide, and Ozone) for every day from 2000 to 2016 in the United States.

Here are some visualization ideas:

  • What states are the biggest polluters? The least?
  • How much has the US polluted over time? Are they polluting more than 10 years ago or less?
  • See if you can create a choropleth map to show geographically the level of pollution over time!

Nutrition Facts for McDonald’s Menu

Project type: Exploratory Data Analysis
Link to dataset here.


This dataset provides a nutrition analysis of every menu item on the US McDonald’s menu, including breakfast, beef burgers, chicken and fish sandwiches, fries, salads, soda, coffee and tea, milkshakes, and desserts.

How many calories does the average McDonald’s value meal contain? Is it really healthier to order grilled chicken instead of crispy? What is the healthiest combination of items that you would have to eat to get your daily nutritional requirements?

Red Wine Quality

Project type: Prediction Modeling
Link to dataset here.


This dataset contains data on various wines, their composition, and their wine quality. This can be a regression or classification problem depending on how you frame it. See if you can predict the quality of a red wine given 11 inputs (fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulfates, and alcohol.

Thanks for Reading!

Topics:
big data, data science, exploratory data analysis, machine learning, predictive analytics, productivity, project

Published at DZone with permission of Terence Shin . See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}