One-sample Tests

 

Statistical tests with one sample: z-test

1. Introduction

In the one-sample t-test article, we have introduced the procedure for the one-sample t-test, which is a hypothesis test for one population mean when the population standard deviation is not provided (which is the more common situation). However, in the rare event that we are provided with the population standard deviation, we can take advantage of this extra information and perform a more (theoretically) accurate hypothesis test: the one-sample z-test. Let’s motivate this using the same example from the one-sample t-test article. Here is the story again:

One day, while waiting for your bus to work, you notice a sign on the bus stop saying the average wait time for the arrival of a bus is 12 minutes. Being an experienced bus rider, you believe this claim is nonsense (you know, waiting for a bus to come is like waiting for the end of a work day), and you decide to dispute this claim. For the next 50 days, you (on purpose) miss a bus and record the time it takes for the next bus to arrive, and you get an average of 13.5 minutes. Now, is it enough to say the sign is wrong? In particular, is there enough evidence to claim that the waiting time should be longer?

 

2. One Sample Z-test

In this article, we will explain one-sample z-test. We use z-test when we want to make inference on a population mean when the population standard deviation is known. The setup for this test is very similar to that of the one-sample t-test. Below let’s quickly list out the different elements of a hypothesis test.

 

3. Elements of a Hypothesis Test

In general, a hypothesis test procedure consists of the following items:

  1. Null (      ) and alternative (      ) hypotheses.

  2. Assumptions to follow.

  3. Test statistic.

  4. Rejection rule.

  5. Conclusion.

 

 

For more details of each item, readers can refer to our one-sample t-test article. Below gives a description of some of the items tailored to the one-sample z-test.

 

4. Null and Alternative Hypotheses

The null hypothesis         assumes that the population mean equals to the status quo. The alternative hypothesis           varies depending on what we want to test:

 

  •        : the population mean is larger than the status quo.

  •        : the population mean is smaller than the status quo.

  •        : the population mean does not equal the status quo.

 

Let’s introduce some notation and summarize the statements above in the table:

  •       - population mean

  •       - status quo

Test type

Reject          if

                    ≤   

                    ≥        

One-tailed

Upper-tailed

Lower-tailed

Two-tailed

<-(because of negative values)

  |     |≥|            |

Table 1: Hypotheses Summary

5. Data Requirements

Just like the t-test, we require to have some assumptions on the data in order to use the z-test. Here is a list of requirements we should have on our data:

 

  1. The variables should have values in a continuous range.

  2. The sample size is large (the convention rule is larger than 30).

  3. Sample values are taken independently.

  4. Population variance is provided.

  5. If your sample size is small, we can still use the z-test, provided that the population variance is known and the population distribution is Normal.

6. Test Statistics

The last column of Table 1 describes the scenarios when we can reject the null hypothesis. The value        is called the test statistics, while the                 is called the critical value. The test statistics can be calculated as follows:

where

      = sample mean

      = population mean (status quo)

      = number of observations

      = population standard deviation

n

7. Critical Value

The critical value is a cut-off value obtained based on the assumed population distribution. For the one-sample z-test, the critical value can be obtained using the following rules:

Table 2: Critical Value.

where Z is the standard Normal distribution, and α is the preset level of significance. In another words, the critical values can be determined by finding the suitable value such that there is α (or α/2) probability at the tail of the standard Normal distribution. For example, if α is 0.05, and we are doing an upper-tailed test, then                          will be 1.645. However, if we are to do a two-tailed test, the test statistics then becomes 1.96. This value can be found using a z-table (see below), or any statistical computing software (see R code).

8. Full Procedure

We have just described all the details of a one-sample z-test. Let’s do a quick summary of the entire procedure:

  1. Determine α, the level of significance.

  2. Define the null (      ) and alternative (      ) hypotheses.

  3. Calculate the test statistics.

  4. Calculate the critical value.

  5. Compare the test statistics with the critical value.

  6. Make a decision and conclusion based on the comparison.

 

9. Example

Let’s work out a full example. At the beginning of this post we revisited the story of a bus rider. Suppose that the bus company provides the standard deviation of bus arrivals, which turns out to be 2.6 minutes. Using this information (plus the data from the beginning), let’s carry out the hypothesis test. Set α to be 0.05. Let’s define some notation. Let µ be the population arrival time, and      be the hypothesized arrival time, which is 12 minutes as claimed by the bus company. You want to show the actual waiting time should be longer, so you are doing an upper-tail hypothesis test.

 

         = 12

         > 12

Test statistics:

 

With an upper-tailed test and α = 0.05,                is 1.645. Now looking at Table 1, since our test statistics is larger than the critical value, we can reject the null hypothesis. This says that we have enough evidence to conclude the arrival time is indeed more than 12 minutes.

10. P-value

Another way to draw the conclusion of a hypothesis test is to use the p-value. In brief, p-value is the probability of seeing a test statistics as extreme as the one we observed, given that we assume the null hypothesis is true. In other words, if the p-value is small, i.e. the probability is small, we can conclude that the null hypothesis is incorrect because, even with a small probability, we are still able to obtain a data set that deviates far from the original value. Hence, we will reject the null hypothesis if the p-value is less than α. The following lists the ways to calculate the p-values for the z-test:

Table 3: P-value Calculation.

Again Z is the standard Normal distribution. Notice in the calculation for the two-tailed test, we multiply the probability by 2. This is because in a two-tailed test, we are not specifying which direction we are looking at, hence we multiply the result by 2 to count both left and right sides. Going back to the example above, the p-value will be Pr(Z > 4.08) < 0.0001, which will lead us to rejecting the null hypothesis as well.

11. Finding P-value and Critical Value using Table

There are three components to this table. 1) The image on top 2) the numbers on the sides and 3) the decimal numbers inside the table. The image at the top of the table tells us what do probabilities inside the table refer to. The numbers on the side are the z-scores, and the probabilities inside the table are the probability towards the left side of each corresponding z-score (for this particular table). For example, if the z-score is 1.45, then we can get P(Z < 1.45) = 0.9265, by going down to the row with 1.4 on the left margin, and column with 0.05 on top. For our example, since we are doing a right-tail test, we will need to subtract the probabilities from this table by 1 to obtain the correct p-value.

Finding the critical value requires a ’reverse operation’. Instead of going from the outside margins towards the inside to obtain probabilities, we will go from the inside towards the outside for the critical value. This is because we can think of the α as a probability. For our right-tail test, we set α = 0.05, hence we want                such that P(Z >              ) = 0.05. Since the table only gives probability towards the left of the z-score we will first subtract α from 1 (which gives 0.95), and find the closest probability to 0.95 inside the table. In many cases we will not be able to find the exact probability, and in our case, the closest probabilities inside the table are 0.9495 and 0.9505. Looking towards the margin, we see that the two z-scores corresponding to these two probabilities are 1.64 and 1.65. Since 0.95 is between 0.9495 and 0.9505, we know that the critical value is also between 1.64 and 1.65. By convention, we simply take the average of these two values:                = 1.645.

 

Note that different z-table may have a different setting: some table may give probabilities from 0 to the z-score, i.e. P(0 < z < Z). Always refer to the image given.

12. R Code

The R code for this can be found on the GitHub repository via here .