DZone
Big Data Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Big Data Zone > Getting Started With Numpy

Getting Started With Numpy

Here's how to get started with Numpy, a third-party library for numerical computing, optimized for working with single- and multi-dimensional arrays.

David Suarez user avatar by
David Suarez
·
Jan. 25, 22 · Big Data Zone · Tutorial
Like (5)
Save
Tweet
5.70K Views

Join the DZone community and get the full member experience.

Join For Free

NumPy is a third-party library for numerical computing, optimized for working with single- and multi-dimensional arrays. Its primary type is the array type called ndarray. This library contains many routines for statistical analysis.

Creating, Getting Info, Selecting, and Util Functions

The 2009 data set 'Wine Quality Dataset' elaborated by Cortez et al. available at UCI Machine Learning, is a well-known dataset that contains wine quality information. It includes data about red and white wine physicochemical properties and a quality score. 

Before we start, in Apiumhub, we have prepared a little example dataset:

Creating

In Numpy you can create arrays in different ways, we are going to see examples of the most common and those that can be most useful for data processing.

Unidimensional array from a list:

Python
 
import numpy as np
list = [1, 2, 3]
uni_numpy_array = np.array(list)

array([1, 2, 3])


Multidimensional array from a list:

Python
 
list = [[1, 2, 3], [4, 5, 6]]
multi_numpy_array = np.array(list)

array([[1, 2, 3],
       [4, 5, 6]])


Multidimensional array where all values are zeros:

Python
 
zeros_array = np.zeros((3, 4))

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])


Multidimensional array where all values are random:

Python
 
random_array = np.random.rand(3, 4)

array([[0.98195491, 0.34964712, 0.13426036, 0.55065786],
       [0.4180283 , 0.36018953, 0.44374156, 0.4366695 ],
       [0.69893273, 0.01089244, 0.4297768 , 0.6985924 ]])


Getting Info

There are several functions that can help us extract information from the data. We are going to explain one by one with examples of its operation and its usefulness.

Getting Array Dimensions

For this, we are going to use the `shape()` function that returns the number of rows and the number of columns (rows, columns).

Python
 
wines_df.shape

(1599, 12)


Getting Data Types

NumPy has several different data types, which mostly map to Python data types, like float, and str. You can find a full listing of the most important NumPy data types here:

1. float – numeric floating-point data.

2. int – integer data.

3. string – character data.

4. object – Python objects.

In this case, we will use the 'dtype' attribute that returns the data type of the array.

Python
 
wines_df.dtype

dtype('float64')


Selecting

Use the syntax np.array[i,j] to retrieve an element at row index i and column index j from the array.

To retrieve multiple elements, use the syntax np.array[(row_values), (column_values)] where row_values and column_values are a tuple of the same size.

Now we are going to show different examples of how to select elements within an array:

Get the first row:

Python
 
first_row = wines_df[:1]

array([[ 7.4   ,  0.7   ,  0.    ,  1.9   ,  0.076 , 11.    , 34.    ,
         0.9978,  3.51  ,  0.56  ,  9.4   ,  5.    ]])


Select the SecondElement from the third row:

Python
 
second_third = wines_df[2, 1:2]

array([0.76])


Select the first three items from the fourth column:

Python
 
first_three_items = wines_df[:3, 3]

array([1.9, 2.6, 2.3])


Select the entire fourth column:

Python
 
fourth_column = wines_df[:, 3]

array([1.9, 2.6, 2.3, ..., 2.3, 2. , 3.6])


Util Functions

Numpy is a library that has an infinity of mathematical operation functions, so we are going to try to summarize in several examples the functions that, as data scientists, we are going to use with more probability.

Sum up the whole 11th column:

Python
 
twelveth_column_sum = wines_df[:, 11].sum()

9012.0


Sum up all the columns:

Python
 
all_columns_sum = wines_df.sum(axis=0)

array([13303.1    ,   843.985  ,   433.29   ,  4059.55   ,   139.859  ,
       25384.     , 74302.     ,  1593.79794,  5294.47   ,  1052.38   ,
       16666.35   ,  9012.     ])


Mean of the first row:

Python
 
first_row_mean = wines_df[:1].mean()

6.211983333333333


Return a bool array where the position value of the 11th column is True if the value was less than five and False in other cases:

Python
 
bool_array = wines_df[:,11] > 5

array([False, False, False, ...,  True, False,  True])


Get the traspose matrix of wines matrix:

Python
 
traspose = np.transpose(wines_df)
traspose.shape

(12, 1599)


Get the flatten array of wines:

Python
 
flatten = wines_df.ravel()
flatten.shape

(19188,)


Turn the 12th row of wines into a two-dimensional array with three rows and four columns:

Python
 
wines_df[1:2].reshape((3,4))

array([[ 7.8   ,  0.88  ,  0.    ,  2.6   ],
       [ 0.098 , 25.    , 67.    ,  0.9968],
       [ 3.2   ,  0.68  ,  9.8   ,  5.    ]])


NumPy Python (language) Data Types Data (computing) Data structure Column (database)

Published at DZone with permission of David Suarez. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • What SREs Can Learn From the Atlassian Nightmare Outage of 2022
  • What Is URL Rewriting? | Java Servlets
  • Image Classification Using SingleStore DB, Keras, and Tensorflow
  • Counting Faster With Postgres

Comments

Big Data Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo