Ask anyone who has attended Stats101 and s/he will tell you that they understand Z-score. But, really? Could you answer the following questions without thinking?

- What is the mean of z-scores?
- What is the standard deviation of z-scores?
- What is the sum of squared z-scores?
- Is the z-score distribution the same as the original distribution of sample values?
- What do z-scores above 0 mean?

If you cannot answer them without thinking, then you don’t really understand z-scores.

Next, let’s gain a deeper understanding of z-scores by looking at the above questions.

The above questions are from this post .

## What is z-score? #

Z-score is also called the standard score. In a one-dimensional array, i.e., a vector, the z-score of a number within this array indicates the distance between this number and the expected value of this array, i.e., the mean, measured by the standard deviation of this array.

`$$z = \frac{x_i - \mu}{\sigma}$$`

Where `$x_i$`

is a number in an array, `$\mu$`

is the mean of this array and `$\sigma$`

is the standard deviation.

Before we can calcuate z-scores,we need to calculate the mean and the standard deviation:

```
import math
def my_mean(array):
return sum(array)/len(array)
def my_std(array):
mn = my_mean(array)
my_sum = 0
for i in array:
my_sum += (i - mn)**2
return math.sqrt(my_sum/len(array))
```

```
# an example,
a = [1,4,6,8,10]
my_mean(a)
```

```
5.8
```

```
my_std(a)
```

```
3.1240998703626617
```

```
def zscores(array):
mn = my_mean(array)
std = my_std(array)
return [(i-mn)/std for i in array]
```

```
z_scores = zscores(a)
z_scores
```

```
[-1.5364425591947517,
-0.5761659596980319,
0.06401843996644804,
0.7042028396309279,
1.3443872392954077]
```

## The sum and the mean of z-scores #

The sum, and the mean of z-scores are always zero. Why?

`$$\sum_{i=1}^n z_i = \sum_{i=1}^n \frac{x_i - \mu}{\sigma} = \frac{\sum_{i=1}^n (x_i - \mu)}{\sigma}$$`

We have:

`$$$$`

`$$\sum_{i=1}^n (x_i - \mu) = \sum_{i=1}^n x_i - n\cdot \mu$$`

Because

`$$\mu = \frac{\sum_{i=1}^n x_i}{n}$$`

We have

`$$\sum_{i=1}^n (x_i - \mu) = 0$$`

So the sum of z-scores is zero. When the sum is zero, the mean is of course zero as well.

## The standard deviation of z-scores #

Let’s calculate the standard devitaiton of z-scores.

`$$\sigma_z = \sqrt{\frac{\sum_{i=1}^n (z_i - E(z))^2}{n}}$$`

Because `$E(z) = 0$`

, we have:

`$$\sigma_z = \sqrt{\frac{\sum_{i=1}^n (z_i)^2}{n}}$$`

This, in fact, leads to our third question:

## The sum of squared z-scores #

`$$\sum_{i=1}^n (z_i)^2 = \sum_{i=1}^n \frac{(x_i - \mu)^2}{\sigma^2} = \frac{\sum_{i=1}^n (x_i - \mu)^2}{\sigma^2}$$`

Because

`$$\sigma = \sqrt{\frac{\sum_{i=1}^n{(x_i - \mu)^2}}{n}}$$`

So we have

`$$\sigma^2 = \frac{\sum_{i=1}^n{(x_i - \mu)^2}}{n}$$`

Therefore

`$$\sum_{i=1}^n (z_i)^2 = n$$`

That is to say, **the sum of squared z-scores is the number of items in an array.**

Then, go back to the standard deviation of z-scores, we can know that

`$$\sigma_z = \sqrt{\frac{\sum_{i=1}^n (z_i)^2}{n}} = \sqrt{\frac{n}{n}} = 1$$`

## The distribution of z-scores #

The distribution of z-scores is the same as the original array. I think of it this way: the z-scores are the result of original numbers moving leftward horizontally by `$\mu$`

and squeezed vertically by `$\sigma$`

. Because they all move together, their relative positions stay the same. And that’s why their distributions are also the same.

## The meaning of the sign of z-scores #

From the formula of z-scores, we know that if a z-score is above zero, it means the corresponding number in the array is bigger than the mean of the array. If a z-scores is below zero, it means the corresponding number in the array is smaller than the mean.

#MLLast modified on 2022-10-01