Ask anyone who has attended Stats101 and s/he will tell you that they understand Z-score. But, really? Could you answer the following questions without thinking?
- What is the mean of z-scores?
- What is the standard deviation of z-scores?
- What is the sum of squared z-scores?
- Is the z-score distribution the same as the original distribution of sample values?
- What do z-scores above 0 mean?
If you cannot answer them without thinking, then you don’t really understand z-scores.
Next, let’s gain a deeper understanding of z-scores by looking at the above questions.
The above questions are from this post .
What is z-score? #
Z-score is also called the standard score. In a one-dimensional array, i.e., a vector, the z-score of a number within this array indicates the distance between this number and the expected value of this array, i.e., the mean, measured by the standard deviation of this array.
Where is a number in an array, is the mean of this array and is the standard deviation.
Before we can calcuate z-scores,we need to calculate the mean and the standard deviation:
import math
def my_mean(array):
return sum(array)/len(array)
def my_std(array):
mn = my_mean(array)
my_sum = 0
for i in array:
my_sum += (i - mn)**2
return math.sqrt(my_sum/len(array))
# an example,
a = [1,4,6,8,10]
my_mean(a)
5.8
my_std(a)
3.1240998703626617
def zscores(array):
mn = my_mean(array)
std = my_std(array)
return [(i-mn)/std for i in array]
z_scores = zscores(a)
z_scores
[-1.5364425591947517,
-0.5761659596980319,
0.06401843996644804,
0.7042028396309279,
1.3443872392954077]
The sum and the mean of z-scores #
The sum, and the mean of z-scores are always zero. Why?
We have:
Because
We have
So the sum of z-scores is zero. When the sum is zero, the mean is of course zero as well.
The standard deviation of z-scores #
Let’s calculate the standard devitaiton of z-scores.
Because , we have:
This, in fact, leads to our third question:
The sum of squared z-scores #
Because
So we have
Therefore
That is to say, the sum of squared z-scores is the number of items in an array.
Then, go back to the standard deviation of z-scores, we can know that
The distribution of z-scores #
The distribution of z-scores is the same as the original array. I think of it this way: the z-scores are the result of original numbers moving leftward horizontally by and squeezed vertically by . Because they all move together, their relative positions stay the same. And that’s why their distributions are also the same.
The meaning of the sign of z-scores #
From the formula of z-scores, we know that if a z-score is above zero, it means the corresponding number in the array is bigger than the mean of the array. If a z-scores is below zero, it means the corresponding number in the array is smaller than the mean.
#MLLast modified on 2025-04-26