Onesample Tests
Statistical tests with one sample: ttest
1. Introduction: Sample vs Population
Imagine the company “Moogle” with 335,000 employees. You would like to understand what is the average age (mean) of employees in that company. Asking 335,000 employees costs a lot of time. Instead, you think of randomly asking 40, which will still get you to approximately the same results, given your sample is a good representation of the population.
A population mean ( ) is one of the parameters of a population and therefore is called a population parameter. In our example, it is an average age of 335,000 employees. A sampling mean is called a sample statistic. In our example, it is an average age of 40 employees.
In this particular example, we try to “learn” the population mean (that comes from 335,000 employees) based on an estimation of our sample mean (that comes from 40 observations). In other words, we try to make inferences from our sample to a population. Thus, the name Inferential Statistics  the study of approximating parameters of our population through statistics of our sample.
2. Hypothesis Testing
What is hypothesis testing and why we need it in Inferential Statistics? Let’s motivate this with a basic example.
One day, while waiting for your bus to work, you notice a sign on the bus stop saying the average wait time for the arrival of a bus is 12 minutes. Being an experienced bus rider, you believe this claim is nonsense (you know, waiting for a bus to come is like waiting for the end of a work day), and you decide to dispute this claim. In particular, you want to show the waiting time is longer than 12 minutes. Therefore, for the next 50 days, you (on purpose) miss a bus and record the time it takes for the next bus to arrive, and you get an average of 13.5 minutes. Now, is there enough evidence to claim that the waiting time should be longer? Often times we encounter situations like these in real life, where we are given a claim about a certain phenomenon (for instance, claim about a population mean, which is 12 minutes in our example), but we want to test it, that is, to test if the claim is right or wrong. In order to prove we are correct (or to prove the claim is wrong), we need to gather data and perform some statistical analysis. We call this procedure a hypothesis test.
When we do hypothesis testing, we have to set up two hypotheses with a population mean ( ): a null hypothesis and an alternative hypothesis:

The Null Hypothesis ( ) states that the population mean equals to the status quo.
In the bus example above, the status quo is a 12minute bus average waiting time, which is assumed to be true. Thus:
: = 12

The Alternative Hypothesis ( ) states what we believe in, which varies depending on what we want to test:
In our bus example, the status quo is a 12minute bus average waiting time.
Therefore, our Alternative Hypothesis here would be:
: the population mean is larger than the status quo.
: > 12
: the population mean is smaller than the status quo.
In our bus example, the status quo is a 12minute bus average waiting time.
Therefore, our Alternative Hypothesis here would be:
: < 12
: the population mean does not equal to the status quo.
In our bus example, the status quo is a 12minute bus average waiting time.
Therefore, our Alternative Hypothesis here would be:
: ≠ 12
Let’s summarize these three types of hypothesis statements in the table below. Remember: you have to choose only one out of three alternatives hypothesis.
Where is the value of our population mean (the status quo).
If we go back to our example, the sign shows that the waiting time is only 12 minutes, and we believe that the waiting time is larger than what the sign shows (or larger than our status quo). Thus, we have our hypothesis statements to be the following:
: = 12
: > 12
Where is our status quo, and is our alternative version of what we believe in.
After we set up hypothesis statements, we now want to make a test whether we have enough statistical evidence to conclude that the waiting time for a bus is indeed higher than 12 minutes. So now it is time to choose the correct test for this example.
3. Choosing The Correct Test
In statistics, tests for numerical data are divided into three categories: tests for one sample, tests for two samples, and tests for more than two samples. In our example, we have only one sample  a sample that consists of 50 waiting times that you have measured. Thus, we need onesample tests (look at our Roadmap to see the overall picture).
One sample tests are generally divided into two types: testing for means and testing for proportion. When testing for means, we have two tests: ztest and ttest. Let’s understand what test is used when.
Ztest assumes a standard normal distribution (also called Gaussian distribution) and is used when we know our population standard deviation. Ttest does not assume standard normal distribution but is quite similar to it, and is used when we do not know our population standard deviation. Proportion test, on the other hand, relies on a different set of assumption (see the Roadmap). Here we will focus only on the ztest and ttest.
Both tests, however, should have a normality assumption. If we use Ztest, we know its population standard deviation and already assume that our sample comes from a standard normal distribution. But when we do not know population standard deviation (that is, when we use a ttest), we have to make sure that our normality assumption is satisfied. This can be done by having more than 30 observations in your sample, due to the Central Limit Theorem.
Below you will find the comparison of the two tests:
Ztest
Ttest
Underlying distribution
Zdistribution is the standard normal distribution (also called Gaussian distribution).
Tdistribution is similar to a standard normal distribution, but it has heavier tails, as shown in the graph below
What it looks like
When do we use it
We use ztest when we know a population standard deviation.
We use ttest when we do not know a population standard deviation.
Assumption
Normality assumptions
Both ztest and ttest require normality assumptions. We cannot use z or t test if we cannot assume that the distribution we are sampling from is normal.
Why it works

Since we know population standard deviation, this is a direct application of the Central Limit Theorem and properties of the Normal distribution (given the sample size is large, say more than 30).

If it is not possible to take more than 30 observations, we can check if these observations follow normal distribution by means of other tests, e.g. KolmogorovSmirnov test.
Since we do not know a population standard deviation, we have to assume its normality. This can be done by:

Taking more than 30 observations (due to a Central Limit Theorem)

If it is not possible to take more than 30 observations, we can check if these observations follow normal distribution by means of other tests, e.g. KolmogorovSmirnov test.
If we fail to assume normality, we cannot use a ttest and have to switch to its nonparametric alternatives.
In our bus example, we do not know the standard deviation of the population where our mean = 12 minutes came from, so we will use Ttest. Remember: for ttest, we also need to assume normality, or have at least 30 observations (we have 50).
4. Ttest
In order to perform ttest we need to:

Compute tstatistic 

Find the critical value in a table based on an ɑ (significance) value 

Compare with , and make a decision
You do not need to understand these steps by now. Let’s go and explain each of them.
4.1 Compute tstatistic
The first step to perform the ttest is to compute the tstatistic ( ) using the formula:
where
 is a sample mean
 is a hypothesized mean
n  is a sample size (number of observations),
s  is an estimated sample standard deviation: since σ is unknown, it can be approximated by the sample standard deviation s using the formula:
Let’s calculate tstatistic from our bus example. Also, assume for now that we have calculated s to be 5 (we will estimate s in another example).
= 13.5
= 12
= 50 (observations)
= 5
s
n
Now we can calculate :
4.2 Find critical value 
Now we need to find the right in the table. Technically, to find , we do not need a 4.1 step. Let’s look at the table to understand what we mean:
You can see that to find our , we need to know only two parameters: socalled Degrees of Freedom v, and our significant level ( ). Let's understand what each of them means.
The values in the table are , such that P(T > ) = , where T is the tdistribution of the appropriate degrees of freedom.

v (degrees of freedom)
v = n  1 , where n is the sample size
In our example, sample size = 50, so our degrees of freedom v = 50  1 = 49. Since in the table above we have to choose either 40 or 60, we always go for the lower number, so the row we need has v = 40.
However, in most computer softwares that can compute the tcritical value, we will be able to compute the exact critical value when v = 49.
Significant level (alpha) is something that we determine ourselves. It is the probability of an error that we allow to have. More specifically, it is the probability which we are willing to take when reject the null hypothesis that we shouldn’t have rejected. For example, if we take alpha to be 1%, it means we allow only a 1% chance of making such a wrong decision, while rejecting the null hypothesis. In more formal terms, we call this the probability of Type I error.
Let’s provide an example: you would like to estimate the IQ mean score of a certain school. You randomly pick 20 pupils in a school canteen during lunch and give them an IQ test. It could happen that, for some reason, only intelligent pupils were eating at the canteen, or you solely by chance selected only intelligent ones, although it was pure randomness. Suppose you then calculated the sample IQ average score (which, obviously, is quite high), and you now want to claim that the average IQ score of the school should be high. You want to be able to make this claim with certain confidence, i.e. minimizing the chance to make any error (such as accidentally picking up only intelligent pupils for your sample). Therefore, we will compare the chance of observing this phenomenon (which is extremely small, but it still exist), with a threshold of a certain percent, say, 1%, and this threshold is the significance level (more will be discussed later).
Normally, the default significance level is 5%, because it allows us to be more critical for the test (the smaller the significance level, the more statistical evidence is needed to prove hypothesis), but that also depends on what we test (see Category Tests).
Let’s take a default value of 5% for our test. That means we are ready to have probability of 5% or less that our sample mean of 13.5 minutes occurred solely by chance.

(significance level)
According to the table, with the degrees of freedom v = 40 and a significance level = 0.05, our = 1.684.
4.3 Comparing &
Now that we have both and , let’s understand how to compare these two values with each other to deduce whether we have enough statistical evidence to reject our status quo of a waiting time of 12 minutes (null hypothesis) and accept the fact that it is actually higher than 12 minutes (alternative hypothesis).
2.12
1.684
4
0
2
4
This is our tdistribution, where the xaxis is all the possible values of t. Let’s plot our and on this xaxis. Now, let’s look at the area under this entire tdistribution’s bell curve. This area represents probabilities. Because we know that the probability cannot be more than 100%, a total area under this curve equals to 1. Let’s look at the area of our , shaded in red color, and the area of our , shaded in blue. We already know that these two areas are certain probabilities, but what exactly are those probabilities?
The area corresponding to is the significance level (shaded in blue) that we have agreed on and set in 4.2 to be 5%. You already know that this is the probability of an error that we allow to have in our sample results. Now let’s have a look at the area corresponding to (shaded in red). This area is called the pvalue. Pvalue is something that we do not set up beforehand, unlike critical value, but calculate based on our . It is an actual probability that something as extreme as our results from our sample occurred, when assuming the null hypothesis is true. It means that if our pvalue is small, that’s implying the null hypothesis is likely to be false, since we are able to observe a data set that doesn’t follow the null distribution under such a small chance. More formally, pvalue should not be bigger than our significance level if we want to prove that the alternative hypothesis is true and the null hypothesis (status quo) is false.
In case our pvalue is bigger than our significance level, or, our estimated probability of an error is bigger than the allowed probability of an error that we set, we would not have enough statistical evidence to conclude that our Alternative Hypothesis is true (and reject our Null Hypothesis). However, this is not saying the null hypothesis should remain true; we simply fail to reject the null hypothesis at this point. Just because we don’t see enough statistical evidence with this sample, it doesn’t mean we won’t with another sample. Moreover, it might be the case that the data exhibits a completely different direction from the current alternative hypothesis (see part 5), but we are simply testing for the wrong direction.
Reject ; Accept
Reject ;
pvalue ≤ significance level
pvalue > significance level
In our example, we have pvalue lower than our significance level, so we can reject our status quo of 12 minutes ( ) and accept our alternative hypothesis that the real waiting time is actually higher ( ). Since our pvalue (marked in red) and significance level (marked in blue) are corresponding to our and respectively, we should have ≥ to reject and accept .
Reject ; Accept
Reject ;
≥
<
5. Tdistribution
Tdistribution is a distribution with only one parameter  degrees of freedom (v), that completely determines the shape of that distribution. Remember that the degrees of freedom in our onesample ttest is n  1.
6. All Types of Hypothesis Testing
If we go back to our example, the sign showed that the waiting time is only 12 minutes, but we anticipated the waiting time to be longer. Thus, we believed that the actual waiting time is larger than what the sign shows (or larger than our status quo). We call this onetailed test as we are interested in testing if it is larger (also known as uppertailed test), with the following hypothesis:
: = 12
: > 12
Now imagine that the sign still shows waiting time to be 12 minutes, but we believe the waiting time to be 10.5 (which is shorter). We call this onetailed test as we are interested in testing if it is smaller (also known as lowertailed test), with the following hypothesis:
: = 12
: < 12
The steps to understand if we have enough statistical evidence to conclude that our alternative hypothesis is true would be exactly the same (steps 4.1  4.3), with one small adjustment on the critical value: the critical value should be negative.
Let’s now imagine that the sign still shows waiting time to be 12 minutes. Imagine that even though we anticipated the waiting time to be longer (just because it feels like it), we actually not sure if the waiting time is larger or smaller than 12 minutes. What we believe is that it is definitely not 12 minutes (so either less or more). So, we would like to test if our real waiting time is not the same as the waiting time of 12 minutes that the sign shows. We call this twotailed test as we are interested in testing both if it is larger or smaller, with the following hypothesis:
: = 12
: ≠ 12
The steps to understand if we have enough statistical evidence to conclude that our alternative hypothesis is true would be almost the same (steps 4.1  4.3) with one small adjustment on the critical value: the significance level needs to be divided by 2 when calculating the critical value for a twotailed test.
More formally, when searching for in the table, we have to divide our agreed significance level by 2. For instance, if we agree to have a significance level of 5%, we will search for the right based on DF = n1 and significance level = alpha/2. Thus, we will find the right with DF = 49 (40 in the table) and v = 0.05/2 = 0.025, which is 2.021 in the table. On the contrary, if we are to make a onetailed test (both uppertailed and lowertailed) with the same significance level of 5%, we will find the right with DF = 49 (40 in the table) and v = 0.05 to be 1.684, as we did in our example above.
6.1 Hypothesis Summary
Now let's have an overview of the tests for ttest.
Test type
Reject if
Onetailed
Uppertailed
Lowertailed
Twotailed
≤
≤
≥
<(because of negative values)
 ≥ 
7. Onesample ttest in R
View/download a template of onesample ttest located in a git repository here.