Data Analysis: Introduction to NumPy

Keshav Likhar
Feb 13, 2022
5 min read

Data analysis is defined as a process of cleaning, transforming, and modeling data to discover useful information for business decision-making. The purpose of Data Analysis is to extract useful information from data and take decisions based upon the data analysis, and NumPy plays a significant role in data analysis.

So, What is NumPy? NumPy stands for Numerical Python. Basically, it is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

Installation To install you can use “pip install numpy”. Though I will recommend you to use the Anaconda Distribution — it includes Python, NumPy, and other commonly used packages for scientific computing and data science. You can use Python IDE (Integrated Development Environment) like Jupyter Notebook or Spyder (both of them come with Anaconda by default).

Getting Started

After done with the installation, you need to import the NumPy library in your Python IDE. We import a library to use the feature of a library into our program. To import you need to write this code given:

import numpy as np

And that’s all it takes to import numpy library, now you might be wondering what is ‘as np’? it is used to write numpy command as ‘np.command’ instead of ‘numpy.command’ every time using numpy!NumPy Arrays

NumPy Arrays are the cardinal reason we use this library, Arrays are basically collections of values, and can be of different dimensions. In NumPy, a number of dimensions of the array is called the rank of the array. 1-dimension array is called Vector and 2-dimension array is called Matrix. The main data structure in NumPy is the ndarray, which is a shorthand name for N-dimensional array.

Creating NumPy Array We can create a numpy array by converting a list or a list of lists. This is not the only way, we can built numpy array with numerous built-in methods like arange, linspace, zeros, ones and random to create random number arrays . Right now, we will start off by creating a list:

list = [1,2,3,4,5,6,7,8,9]

converting a list into numpy array:


numpy_array = np.array(my_list)
numpy_array               #1-D array

output:array([1, 2, 3, 4, 5, 6, 7, 8, 9])

Now,


list2 = [[1,2,3],[4,5,6],[7,8,9]]
numpy_twod = np.array(list2)
numpy_twod

output will be a two dimensional numpy array:
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

Attributes and Methods

You can use various attributes and methods on numpy arrays like reshape which Returns an array containing the same data with a new shape. Here, we have converted a

num_arr = numpy_array.reshape(3,3)
num_arr

output will be:

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

Just like that, you can use several methods and attributes, For instance, arr is numpy array

Methods :

arr.min(), arr.max()

— used too find maximum and minimum elements in the array

arr.argmin(), arr.argmax()

— used too find index of maximum and minimum elements in the array

arr.astype(dtype)

— for the conversion of datatype.

arr.tolist()

— to convert numpy array into Python list.

arr.sort()

— to sort according to the user.

Attributes :

arr.size

— which gives the number of elements in the array.

arr.shape

— which is an attribute that arrays have to determine the shape.

arr.dtype

— to determine the data type of object.

Indexing and Slicing

Bracket Indexing and Selection

The simplest way to pick one or some elements of an array :

lets, get an element with index: array[index]

l1 = [1,2,3,4,5,6,7,8,9,10]
arr = np.arange(l1)
arr[4]    Output: 5         #will return the element in the 4th index
-------------------------------------
arr[1:5]Output: array([2, 3, 4, 5])   #Get values in a range

Broadcasting

Numpy arrays differ from a normal Python list because of their ability to broadcast.

In 1D array:

arr[0:5]=100
Output: array([100, 100, 100, 100, 100, 6, 7, 8, 9, 10])
---------------------------------------
new_arr = arr[3:7] 
new_arr
Output: array([100, 100, 6, 7]) #new array with sliced element  
---------------------------------------
new_arr[:] = 50  #Select all elements in array
new_arrOutput: array([50, 50, 50, 50])
--------------------------------------- 
Now,if you check the 'arr' array, the changes also occurred in the original array arr
output: array([100, 100, 100, 50, 50, 50, 50, 8, 9, 10])
#Data is not copied to make a new array, it's a view of the original array! This avoids memory problems! 
---------------------------------------
#to make a explicit array we have to use copy method 
#To get a copy
arr_copy = arr.copy()arr_copy
output: array([100, 100, 100, 50, 50, 50, 50, 8, 9, 10])
---------------------------------------

In 2D Array :

arr[row][col] or arr[row,col] are two formats of the 2-D array which can be used to return the element in the NumPy array. we can also use indexing in arr[ : , : ] manner to index values.

arr[row_index] — for an only row, and you can explore more in this field.

Selection

Selecting elements based on conditional operators:

arr = np.arange(1,11) # created a array using arange
arr
output: array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
--------------------------------
arr > 6
output: array([False, False, False, False,  Flase,  False,  True,  True,  True,  True], dtype=bool)
--------------------------------
bool_arr = arr>4
bool_arr
output: array([False, False, False, False,  True,  True,  True,  True,  True,  True], dtype=bool)
--------------------------------
arr[bool_arr]
output: array([ 5,  6,  7,  8,  9, 10])
--------------------------------

Fantastic!! You did a lot, that’s great work.

Now, let's move forward to the next part!!

NumPy Operations

Arithmetic

You can easily perform array with array arithmetic, or scalar with array arithmetic. Let’s try some examples:

arr = np.arange(0,10) # create a new array
arr
output: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
------------------------------ 
arr + arr #Addition
output: array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
-----------------------------
arr — arr #Substraction
output: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
-----------------------------
arr * arr #Muliplication
output: array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])
-----------------------------

You can also divide and use exponent, there are also methods for arithmetic operations:

np.add(arr, 10) — will add 10 to each element in the array

np.add(arr1,arr2) — will add array 2 to array 1. The same is true to np.subtract(), np.divide(), np.power() and np.multiply().

You can also get NumPy to return different values from the array, like:

np.sqrt(arr) — will return the square root of each element in the array

np.sin(arr) — will return the sine of each element in the array

np.log(arr) — will return the natural log of each element in the array

np.abs(arr) — will return the absolute value of each element in the array

np.array_equal(arr1,arr2) — will return True if the arrays have the same elements and shape.

This is not only limited to this, there are myriad inbuilt methods and functions that you can explore, implement and learn about them and their specific use according to the needs and problems in the datasets. So, that’s all it takes to get started with NumPy.

Awesome! You should be proud!!

Thank You for Reading!! This is the overview of NumPy library, its basic operations, and implementation. I hope you can understand and explore more about this topic in the future. If you like this blog, you can like and comment or If have any problem or question about this topic or article, you can comment or let me know about the query on my email.