Where to Start With a New Data Problem

DZone 's Guide to

Where to Start With a New Data Problem

Procrastination is a problem for everyone, even data scientists. So, one of our own gives some tips to quickly dive into a Python-based project.

· Big Data Zone ·
Free Resource

Image title


So I get a data file, CSV, text, etc…. and my usual first step is to stare at the file in my downloads folder for a few minutes.  Maybe then change the name. Then go make some coffee. Then come back and read the name of the file again. Maybe change it back.

I’ll open some IDE and make a new Python file.  Save it.  Stare at that. Import some libraries… that name sucks, I should change it.

The news is on, I should probably watch.

My point is that it’s hard to start. And the best way to start is just to start. Here’s a good list of things to put in your .py file to at least start getting a handle on what you’re dealing with.

Import Libraries

You might not need them all, but you can always remove them later:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_boston
from sklearn import metrics

Import Your Data

customers = pd.read_csv("Ecommerce Customers.csv")

Get Some Visualizations Going

Basic stuff.


Grab Some Coefficients

See if anything stands out:

coeff_df = pd.DataFrame(lm.coef_,X.columns,columns=['Coefficient'])

Print Some Nice Plots

snsData = sns.load_dataset('tips')
sns.heatmap(snsData.corr(), annot=True)

Although not the answer to any of your main problems, this will at least get the process going and the juices running.


big data, data visaulization, import data, python, tutorial

Published at DZone with permission of Matt Hughes . See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}