Getting Started With Numpy
Here's how to get started with Numpy, a third-party library for numerical computing, optimized for working with single- and multi-dimensional arrays.
Join the DZone community and get the full member experience.Join For Free
NumPy is a third-party library for numerical computing, optimized for working with single- and multi-dimensional arrays. Its primary type is the array type called ndarray. This library contains many routines for statistical analysis.
Creating, Getting Info, Selecting, and Util Functions
The 2009 data set 'Wine Quality Dataset' elaborated by Cortez et al. available at UCI Machine Learning, is a well-known dataset that contains wine quality information. It includes data about red and white wine physicochemical properties and a quality score.
Before we start, in Apiumhub, we have prepared a little example dataset:
In Numpy you can create arrays in different ways, we are going to see examples of the most common and those that can be most useful for data processing.
Unidimensional array from a list:
import numpy as np list = [1, 2, 3] uni_numpy_array = np.array(list) array([1, 2, 3])
Multidimensional array from a list:
list = [[1, 2, 3], [4, 5, 6]] multi_numpy_array = np.array(list) array([[1, 2, 3], [4, 5, 6]])
Multidimensional array where all values are zeros:
zeros_array = np.zeros((3, 4)) array([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]])
Multidimensional array where all values are random:
random_array = np.random.rand(3, 4) array([[0.98195491, 0.34964712, 0.13426036, 0.55065786], [0.4180283 , 0.36018953, 0.44374156, 0.4366695 ], [0.69893273, 0.01089244, 0.4297768 , 0.6985924 ]])
There are several functions that can help us extract information from the data. We are going to explain one by one with examples of its operation and its usefulness.
Getting Array Dimensions
For this, we are going to use the `shape()` function that returns the number of rows and the number of columns (rows, columns).
wines_df.shape (1599, 12)
Getting Data Types
NumPy has several different data types, which mostly map to Python data types, like float, and str. You can find a full listing of the most important NumPy data types here:
1. float – numeric floating-point data.
2. int – integer data.
3. string – character data.
4. object – Python objects.
In this case, we will use the 'dtype' attribute that returns the data type of the array.
Use the syntax np.array[i,j] to retrieve an element at row index i and column index j from the array.
To retrieve multiple elements, use the syntax np.array[(row_values), (column_values)] where row_values and column_values are a tuple of the same size.
Now we are going to show different examples of how to select elements within an array:
Get the first row:
first_row = wines_df[:1] array([[ 7.4 , 0.7 , 0. , 1.9 , 0.076 , 11. , 34. , 0.9978, 3.51 , 0.56 , 9.4 , 5. ]])
Select the SecondElement from the third row:
second_third = wines_df[2, 1:2] array([0.76])
Select the first three items from the fourth column:
first_three_items = wines_df[:3, 3] array([1.9, 2.6, 2.3])
Select the entire fourth column:
fourth_column = wines_df[:, 3] array([1.9, 2.6, 2.3, ..., 2.3, 2. , 3.6])
Numpy is a library that has an infinity of mathematical operation functions, so we are going to try to summarize in several examples the functions that, as data scientists, we are going to use with more probability.
Sum up the whole 11th column:
twelveth_column_sum = wines_df[:, 11].sum() 9012.0
Sum up all the columns:
all_columns_sum = wines_df.sum(axis=0) array([13303.1 , 843.985 , 433.29 , 4059.55 , 139.859 , 25384. , 74302. , 1593.79794, 5294.47 , 1052.38 , 16666.35 , 9012. ])
Mean of the first row:
first_row_mean = wines_df[:1].mean() 6.211983333333333
Return a bool array where the position value of the 11th column is True if the value was less than five and False in other cases:
bool_array = wines_df[:,11] > 5 array([False, False, False, ..., True, False, True])
Get the traspose matrix of wines matrix:
traspose = np.transpose(wines_df) traspose.shape (12, 1599)
Get the flatten array of wines:
flatten = wines_df.ravel() flatten.shape (19188,)
Turn the 12th row of wines into a two-dimensional array with three rows and four columns:
wines_df[1:2].reshape((3,4)) array([[ 7.8 , 0.88 , 0. , 2.6 ], [ 0.098 , 25. , 67. , 0.9968], [ 3.2 , 0.68 , 9.8 , 5. ]])
Published at DZone with permission of David Suarez. See the original article here.
Opinions expressed by DZone contributors are their own.