The Basics Of Linear Algebra for Data Scientist

Published in

Oretes Academy

7 min readFeb 2, 2021

In short, we can say that linear algebra is the ‘ math of vectors and matrices ‘. We make use of such vectors and matrices since these are convenient mathematical ways of representing large amounts of information.

A matrix is an array of numbers, symbols or expressions, made up of rows and columns. A matrix is characterized by the number of rows, m, and the number of columns, n, it has. In general, a matrix of order ‘ m x n ‘ (read: “m by n”) has m rows and n columns. Below, we display an example 2 x 3 matrix A:

We can refer to individual elements of the matrix through its corresponding row and column. For example, A[1, 2] = 2, since in the first row and second column the number 2 is placed.

A matrix with only a single column is called a vector. For example, every column of the matrix A above is a vector. Let us take the first column of matrix A as the vector v:

In a vector, we can also refer to individual elements. Here, we only have to make use of a single index. For example, v[2] = 4, since 4 is the second element of the vector v.

Our ability to analyze and solve particular problems within the field of linear algebra will be greatly enhanced when we can perform algebraic operations with matrices. Here, the most important basic tools for performing these operations are listed.

If A and B are m x n matrices, then the sum A+B is the m x n matrix whose columns are the sums of the corresponding columns in A and B. The sum A+B is defined only when A and B are the same size.

Of course, subtraction of the matrices, A-B, works in the same way, where the columns in B are subtracted from the columns in A.

If r is a scalar, then the scalar multiple of the matrix A is r*A, which is the matrix whose columns are r times the corresponding columns in A.

r is a scalar, then the scalar multiple of the matrix A is r*A, which is the matrix whose columns are r times the corresponding columns in A.

If the matrix A is of size m x n (thus, it has n columns), and u is a vector of size n, then the product of A and u, denoted by Au, is the linear combination of the columns of A using the corresponding entries in u as weights.

Note: The product Au is defined only if the number of columns of the matrix A equals the number of entries in the vector u!

Properties: If A is an m x n matrix, u and v are vectors of size n and r is a scalar, then:

If A is an m x n matrix and B = [ b1, b2, …, bp] is an n x p matrix where bi is the i-th column of the matrix B, then the matrix product AB is the m x p matrix whose columns are A b1, A b2, …, A bp. So, essentially, we perform the same procedure as in (iii) with matrix-vector multiplication, where each column of the matrix B is a vector.

Since

Note: The number of columns in A must match the number of rows in B in order to perform matrix multiplication.

Properties: Let A be an m x n matrix, let B and C have sizes such that the sums and products are defined, and let r be scalar. Then:

If A is an n x n matrix and k is a positive integer, then A^k (A to the power k) is the product of k copies of A:

Suppose we have a matrix A of size m x n, then the transpose of A (denoted by A^T) is the n x m matrix whose columns are formed from the corresponding rows of A.

Properties: Let A and B be matrices whose sizes are appropriate for the following sums and products. Then:

Matrix algebra provides tools for manipulating matrices and creating various useful formulas in ways similar to doing ordinary algebra with real numbers. For example, the (multiplicative) inverse of a real number, say 3, is 3^-1, or 1/3. This inverse satisfies the following equations:

This concept can be generalized for square matrices. An n x n matrix A is said to be invertible if there is an n x n matrix C such that:

Where I is the n x n identity matrix. An identity matrix is a square matrix with 1’s on the diagonal and 0’s elsewhere. Below, the 5 x 5 identity matrix is shown:

Going back to the invertibility principle above, we call the matrix C an inverse of A. In fact, C is uniquely determined by A, because if B were another inverse of A, then B = BI = B(AC) = (BA)C = IC = C. This unique inverse is denoted by A^-1, so that:

Properties:

An orthogonal matrix is a square matrix whose columns and rows are orthogonal unit vectors. That is, an orthogonal matrix is an invertible matrix, let us call it Q, for which:

This leads to the equivalent characterization: a matrix Q is orthogonal if its transpose is equal to its inverse:

To show the relevance of linear algebra in the field of data science, we are briefly going through two relevant applications.

The singular value decomposition (SVD) is a very important concept within the field of data science. Some important applications of the SVD are image compression and dimensionality reduction. Let us focus on the latter application here. Dimensionality reduction is the transformation of data from a high-dimensional space into a lower-dimensional space, in such a way that the most important information of the original data is still retained. This is desirable since the analyzing of the data becomes computationally intractable once the dimension of the data is too high.

The SVD decomposes a matrix into a product of three individual matrices, as shown below:

where

Assuming the matrix M is an m x n matrix:

U is an m x m orthogonal matrix of left singular vectors;
Σ is an m x n matrix for which the diagonal entries in D (which is r x r) are the first r singular values of M;
V is an n x n orthogonal matrix of right singular vectors.

The singular values can be used to understand the amount of variance that is explained by each of the singular vectors. The more variance it captures, the more information it accounts for. In this way, we can use this information to limit the number of vectors to the amount of variance we wish to capture.

It is possible to calculate the SVD by hand, but this quickly becomes an intensive process when the matrices become of a higher dimension. In practice, one is dealing with huge amounts of data. Luckily, we can easily implement the SVD in Python, by making use of Numpy. To keep the example simple, we define a 3x3 matrix M:

import numpy as np# define the matrix as a numpy arrayprint("Left Singular Vectors:")
from numpy.linalg import svd 
M = np.array([[4, 1, 5], [2, -3, 2], [1, 2, 3]]) U, Sigma, VT = svd(M) 
print(U)
print("Singular Values:")
print(np.diag(Sigma))
print("Right Singular Vectors:")
print(VT)

Output:

Left Singular Vectors:
[[-0.84705289 0.08910901 -0.52398567]
 [-0.32885778 -0.8623538 0.38496556]
 [-0.41755714 0.49840295 0.75976347]]
Singular Values:
[[7.62729138 0. 0. ]
 [0. 3.78075422 0. ]
 [0. 0. 0.72823326]]
Right Singular Vectors:
[[-0.58519913 -0.09119802 -0.80574494]
 [-0.23007807 0.97149302 0.0571437 ]
 [-0.77756419 -0.21882468 0.58949953]]

So, in this small example, the singular values (usually denoted as σ’s) are σ1 = 7.627, σ2 = 3.781, σ3 = 0.728. Thus, when only using the first two singular vectors, we explain (σ¹² + σ²²) / (σ¹² + σ²² + σ³²) = 99.3% of the variance!

Just as the singular value decomposition, principal component analysis (PCA) is an alternative technique to reduce dimensionality. The objective of PCA is to create new uncorrelated variables, called the principal components, that maximize the captured variance. So, the idea is to reduce the dimensionality of the data set, while preserving as much ‘variability’ (that is, information) as possible. This problem reduces to solving an eigenvalue-eigenvector problem.

Since eigenvalues and eigenvectors are beyond the scope of this article, we will not dive into the mathematical explanation of PCA.

Math is everywhere within the field of data science. To be a successful data scientist, you definitely not need to know all the ins and outs of the math behind each concept. However, to be able to make better decisions when dealing with the data and algorithms, you need to have a solid understanding of the math and statistics behind it. This article focused on the basics of linear algebra, which is a very important discipline of math to understand as a data scientist.

Looking to start a career as a Data Scientist?

The Oretes Academy Data Scientist program with TCS iON Certification allows students to gain real-world data science experience with projects designed by industry experts. Build your portfolio and master the skills necessary to become a successful Data Scientist.

The Basics Of Linear Algebra for Data Scientist

Written by Oretes Academy