Re-Understanding Z Score

Hongtao Hao / 2022-08-20

Ask anyone who has attended Stats101 and s/he will tell you that they understand Z-score. But, really? Could you answer the following questions without thinking?

  1. What is the mean of z-scores?
  2. What is the standard deviation of z-scores?
  3. What is the sum of squared z-scores?
  4. Is the z-score distribution the same as the original distribution of sample values?
  5. What do z-scores above 0 mean?

If you cannot answer them without thinking, then you don’t really understand z-scores.

Next, let’s gain a deeper understanding of z-scores by looking at the above questions.

The above questions are from this post .

What is z-score? #

Z-score is also called the standard score. In a one-dimensional array, i.e., a vector, the z-score of a number within this array indicates the distance between this number and the expected value of this array, i.e., the mean, measured by the standard deviation of this array.

z=xiμσ

Where xi is a number in an array, μ is the mean of this array and σ is the standard deviation.

Before we can calcuate z-scores,we need to calculate the mean and the standard deviation:

import math 

def my_mean(array):
    return sum(array)/len(array)

def my_std(array):
    mn = my_mean(array)
    my_sum = 0
    for i in array:
        my_sum += (i - mn)**2
    return math.sqrt(my_sum/len(array))
# an example, 
a = [1,4,6,8,10]
my_mean(a)
5.8
my_std(a)
3.1240998703626617
def zscores(array):
    mn = my_mean(array)
    std = my_std(array)
    return [(i-mn)/std for i in array]
z_scores = zscores(a)
z_scores
[-1.5364425591947517,
 -0.5761659596980319,
 0.06401843996644804,
 0.7042028396309279,
 1.3443872392954077]

The sum and the mean of z-scores #

The sum, and the mean of z-scores are always zero. Why?

i=1nzi=i=1nxiμσ=i=1n(xiμ)σ

We have:

i=1n(xiμ)=i=1nxinμ

Because

μ=i=1nxin

We have

i=1n(xiμ)=0

So the sum of z-scores is zero. When the sum is zero, the mean is of course zero as well.

The standard deviation of z-scores #

Let’s calculate the standard devitaiton of z-scores.

σz=i=1n(ziE(z))2n

Because E(z)=0, we have:

σz=i=1n(zi)2n

This, in fact, leads to our third question:

The sum of squared z-scores #

i=1n(zi)2=i=1n(xiμ)2σ2=i=1n(xiμ)2σ2

Because

σ=i=1n(xiμ)2n

So we have

σ2=i=1n(xiμ)2n

Therefore

i=1n(zi)2=n

That is to say, the sum of squared z-scores is the number of items in an array.

Then, go back to the standard deviation of z-scores, we can know that

σz=i=1n(zi)2n=nn=1

The distribution of z-scores #

The distribution of z-scores is the same as the original array. I think of it this way: the z-scores are the result of original numbers moving leftward horizontally by μ and squeezed vertically by σ. Because they all move together, their relative positions stay the same. And that’s why their distributions are also the same.

The meaning of the sign of z-scores #

From the formula of z-scores, we know that if a z-score is above zero, it means the corresponding number in the array is bigger than the mean of the array. If a z-scores is below zero, it means the corresponding number in the array is smaller than the mean.

#ML

Last modified on 2025-04-26