Introduction to Hypothesis Testing
In statistics, hypothesis testing is used to determine whether or not the results of experiments are significant. These experiments are conducted on sample data, to draw inferences about the population. Essentially, when conducting experiments there are two groups: the control group (the ones that receive no treatment, or the ones that do not see the new background of a website) and the experimental group (the ones that receive the treatment, or the ones that see the changes in the website).
The first step when conducting hypothesis testing is to ask a question. For example, “does changing the background color of this website increase the amount of time that users spend on it?”. It is also important to do research on the subject. Has this experiment been done before? Have similar experiments been done?
The next step, which is the focus of this blog post, is to formulate a hypothesis. In this stage, two hypotheses are formulated: the null hypothesis and the alternative hypothesis.
The Null Hypothesis (H0): states that there is no relationship. For example: changing the background color of the website does not change the amount of time that users spend on the website.
The Alternative Hypothesis (H1): represents what is being tested with the experiment. For instance, changing the background color of the website increases the amount of time that users spend on the website.
Depending on how the hypothesis statement is structured, the researcher needs to choose between one of the three types of testing (figure 1):
- Left-tailed test: when the alternative hypothesis contains a value less than the value that is being compared to. For example: changing the website background color, will change the time users spend on the website to less than 5 minutes.
- Right-tailed: when the alternative hypothesis contains a value that is more than the value that is being compared to. For example: changing the website background color, will change the time users spend on the website to more than 5 minutes.
- Two-tailed: when the alternative falls within a range. For example: changing the website background color, will change the time users spend on the website to less than or greater than 5.
One more step that needs to be done is to define the significance level, also denoted as alpha. Alpha can be any value from 0 to 1 with the most common being 0.05. The significance level represents the threshold at which the null hypothesis is rejected or failed to be rejected.
Hypothesis Testing Example
H0: changing the background color of the website does not affect the average time that people spend on the webpage.
H1: changing the background color changes the average time spent on the webpage to more than 5 minutes.
In order to conduct the experiment, a sample of people is selected from the people using the new background. Assuming that the sample has a normal distribution, and given the sample’s statistics such as mean and standard deviation, the test statistic can be done and this information can be used to estimate the p-value. The p-value represents the probability of observing the 5 minutes given that the null hypothesis is true.
From this analysis, there are two possible outcomes:
- p-value < alpha, the null hypothesis is rejected. Let’s assume that the p-value is 0.003. This means that the probability of the users spending 8 minutes on the page with the new background is only 3% which is very small (smaller than the threshold od 5%). Therefore, the null hypothesis is rejected.
2. p-value ≥ alpha. In this case, the experiment failed to reject the null hypothesis.
So, in the end, the hypothesis testing is not about proving the alternative hypothesis to be true. The test is about rejecting or failing to reject the null hypothesis and showing that “it’s statistically unlikely that there is no relationship” between the variables being tested.
References