Two-sample Tests


Statistical tests with two samples: z-test

1. Introduction

Suppose you are a market analyst, and you wish to study the difference in gas price between two cities in the San Francisco Bay Area, say Mountain View and San Jose. You suspect that the gas price in Mountain View is different (well, those are two different cities after all). You randomly pick 5 stations from each city, and record the gas price for the medium grade for 10 days straight in Mountain View (50 measurements total), and 9 days in San Jose (45 measurements total), and result in a mean price of $3.36 for Mountain View and $3.03 for San Jose. Can you say the average gas price in different?


We have seen the one sample z-test, where we want to make inference on a population mean when the population standard deviation is known. Often times, however, we wish to compare two population means instead. For example, one may wish to study the income difference between genders, sleep time between age groups, commute time between cities, etc. Rather than focusing on only one population, now we need to extend our methodology to two populations.

2. Two Sample Z-test

In this article, we will explain two-sample Z-test. We use the two-sample Z-test when we want to make inference on two population means when both population standard deviations are known. In general, we would like to see if the difference between the two population means is larger than, lower than, or different from a particular hypothesized value (or status quo). In the special case where this status quo is 0, we are then simply comparing whether one of the means is larger or smaller than the other, or if they simply take different values.


3. Null and Alternative Hypotheses

Since we are dealing with two populations now, the null hypothesis       assumes that the difference between the two population means equals a hypothesized difference. 

The alternative hypothesis        varies depending on what we want to test:


  •        : the difference between the population mean of group 1 and 2 is larger than the hypothesized value.

  •        : the difference between the population mean of group 1 and 2 is less than the hypothesized value.

  •        : the difference between the population mean of group 1 and 2 is different from the hypothesized value.


Let’s introduce some notation and summarize the statements above in the table:

  •     ,        - population mean

  •       - status quo


Test type

Reject          if







<-(because of negative values)


  |     |≥|             |

Table 1: Hypotheses Summary

In the special (or more common) case that    is 0, we can put      to the other side of the equation in the hypotheses. For example, the null hypothesis becomes      =   , and the two-tailed alternative hypothesis becomes       ≠      .

4. Data Requirements

Similar to all other hypothesis tests, we need to have some assumptions on our data:

  1. The variables should have values in continuous range.

  2. The two populations are independent.

  3. Sample size is large for both groups (the convention rule is larger than 30). They are not required to be the same though.

  4. Sample values are taken independently.

  5. Population variances are provided.

  6. If your sample size is small, we can still use the z-test, provided that the population variances are known and the population distributions are Normal.

5. Test Statistics

Just like all other tests, we have to calculate the test statistics      using the data. The test statistics can be calculated as follows:


- sample means from population 1 and 2 respectively

- status quo/hypothesized difference

- population variances from population 1 and 2 respectively

- sample sizes of population 1 and 2 respectively

6. Critical Value

The critical value is a cut-off value obtained based on the assumed population distribution. For the two-sample z-test, the critical value can be obtained using the following rules:

Table 2: Critical Value.

where Z is the standard Normal distribution, and α is the preset level of significance. In another words, the critical values can be determined by finding the suitable value such that there is α (or α/2) probability at the tail of the standard Normal distribution. For example, if α is 0.05, and we are doing an upper-tailed test, then                            will be 1.645. However, if we are to do a two-tailed test, the test statistics then becomes 1.96.

Since we are using the Normal distribution for the two-sample z-test, the procedure for finding the critical value will be the same as that for the one-sample z-test. Readers can refer to here for more details on how to obtain the critical value a numerical table.

7. Full Procedure

We have just described all the details of a one-sample z-test. Let’s do a quick summary of the entire procedure:

  1. Determine α, the level of significance.

  2. Define the null (      ) and alternative (      ) hypotheses.

  3. Calculate the test statistics.

  4. Calculate the critical value.

  5. Compare the test statistics with the critical value.

  6. Make a decision and conclusion based on the comparison.


8. Example

Let’s finish the example we started in the beginning of this article. Suppose you know (or the oracle tells you) that the population standard deviations are $0.45 and $0.41 for Mountain View and San Jose gas prices respectively. Using this information (plus the data from the beginning), let’s carry out the hypothesis test. Set α to be 0.05.

Let      be the population mean gas price of Mountain View, and     similarly for San Jose. We want to see if the gas prices are different, therefore the hypothesized difference      is simply 0. Since we are showing difference, this will be a two-tailed test.

                      = 0

                      ≠ 0

Test statistics:


With an two-tailed test and α = 0.05,               is 1.96. Now looking at Table 1, since the absolute value of our test statistics is larger than the critical value, we can reject the null hypothesis. This says that we have enough evidence to conclude the gas prices are indeed different.

Note: It is important to keep in mind which group is population 1 and which is population 2, as it will cause a difference when we write up our alternative hypothesis. For example, if we want to show Mountain View’s gas price is higher, we will write      >     , but if we define population 2 as Mountain View while population 1 is San Jose, the alternative hypothesis then becomes       <      .

9. P-value

Another way to draw the conclusion of a hypothesis test is to use the p-value. In brief, p-value is the probability of seeing a test statistics as extreme as the one we observed, given that we assume the null hypothesis is true. In other words, if the p-value is small, i.e. the probability is small, we can conclude that the null hypothesis is incorrect because, even with a small probability, we are still able to obtain a data set that deviates far from the original value. Hence, we will reject the null hypothesis if the p-value is less than α. The following lists the ways to calculate the p-values for the z-test:

Table 3: P-value Calculation.

Again, Z is the standard Normal distribution. Notice in the calculation for the twotailed test, we multiply the probability by 2. This is because in a two-tailed test, we are not specifying which direction we are looking at, hence we multiply the result by 2 to count both left and right sides. Going back to the example above, the p-value will be 2 ∗ P r(Z > 3.74) = 0.0002, which will lead us to rejecting the null hypothesis as well.

10. R Code

The R code for this can be found on the GitHub repository via here.