Find out why the Welch t-test is the preferred method for making accurate statistical comparisons, even when variances differ.
Part 1: Background
In the first semester of my graduate studies, I had the opportunity to take the course STAT7055: Introduction to Statistics for Business and Finance. Throughout the course, I definitely felt a little burned out at times, but the amount of knowledge I gained about applying various statistical methods in different situations was truly priceless. During the eighth week of lectures, something really interesting caught my attention, specifically the concept of Hypothesis Testing when comparing two populations. I found it fascinating to learn how the approach differs depending on whether the samples are independent or paired, as well as what to do when we do or do not know the population variance of the two populations, as well as how to perform hypothesis testing. for two proportions. However, there is one aspect that was not covered in the material and it keeps me wondering how to approach this particular scenario, which is performing a hypothesis test based on two population means when the variances are unequal, known as Welch's t test.
To understand the concept of how Welch's t-test is applied, we can explore a data set for the example case. Each stage of this process involves the use of the real-world data set.
Part 2: the data set
The dataset I am using contains real-world data on World Agricultural Supply and Demand Estimates (WASDE) that is updated periodically. The WASDE dataset is produced by the World Agricultural Outlook Board (WAOB). It is a monthly report that provides annual forecasts for various regions of the world and the United States for wheat, rice, coarse grains, oilseeds, and cotton. Additionally, the data set also covers forecasts for sugar, meat, poultry, eggs, and milk in the United States. It comes from the Nasdaq website and you can access it for free here: WASDE data set. There are 3 sets of data, but I only use the first one, which is the supply and demand data. The column definitions can be seen here:
I am going to use two different samples of specific regions, products and items to simplify the testing process. Additionally, we will use the R programming language for the end-to-end procedure.
Now let's do some proper data preparation:
library(dplyr)# Read and preprocess the dataframe
wasde_data <- read.csv("wasde_data.csv") %>%
select(-min_value, -max_value, -year, -period) %>%
filter(item == "Production", commodity == "Wheat")
# Filter data for Argentina and Australia
wasde_argentina <- wasde_data %>%
filter(region == "Argentina") %>%
arrange(desc(report_month))
wasde_oz <- wasde_data %>%
filter(region == "Australia") %>%
arrange(desc(report_month))
I divided two samples into two different regions, namely Argentina and Australia. And the focus is on wheat commodity production.
Now we are ready. But wait..
Before delving further into the application of Welch's t-test, I can't help but wonder why it is necessary to test whether the variances of two populations are equal or not.
Part 3: Test for equality of variances
When performing hypothesis tests to compare two population means without knowing the population variances, it is crucial to confirm the equality of the variances to select the appropriate statistical test. If the variances turn out to be the same, we opt for the t-test of pooled variances; otherwise, we can use Welch's t test. This important step ensures the accuracy of the results, as using the wrong test could lead to erroneous conclusions due to increased risks of Type I and Type II errors. By checking for equality in variances, we ensure that the hypothesis testing process is based on accurate assumptions, ultimately leading to more reliable and valid conclusions.
So how do we test for the two population variances?
We have to generate two hypotheses as shown below:
The general rule is very simple:
- If the test statistic falls in the rejection region, then H0 or null hypothesis is rejected.
- Otherwise, we will not be able to reject H0 or the null hypothesis.
We can pose the hypotheses like this:
# Hypotheses: Variance Comparison
h0_variance <- "Population variance of Wheat production in Argentina equals that in Australia"
h1_variance <- "Population variance of Wheat production in Argentina differs from that in Australia"
Now we should do the test statistic. But how do we get this test statistic? we use Test F.
An F test is any statistical test used to compare the variances of two samples or the ratio of variances between multiple samples. The test statistic, random variable F, is used to determine whether the tested data have an F distribution under the true null hypothesis and the usual true assumptions about the error term.
we can generate the test statistic value by dividing two sample variances like this:
and the rejection region is:
where n is the sample size and alpha is the significance level. so, when the value of F falls in any of these rejection regions, we reject the null hypothesis.
but..
The trick is: the labeling of sample 1 and sample 2 is actually random, so make sure to put the largest sample variance at the top every time. This way, our F statistic will be consistently greater than 1, and we only need to refer to the upper limit to reject H0 at the α significance level whenever.
we can do this by:
# Calculate sample variances
sample_var_argentina <- var(wasde_argentina$value)
sample_var_oz <- var(wasde_oz$value)# Calculate F calculated value
f_calculated <- sample_var_argentina / sample_var_oz
We will use a significance level of 5% (0.05), so the decision rule is:
# Define significance level and degrees of freedom
alpha <- 0.05
alpha_half <- alpha / 2
n1 <- nrow(wasde_argentina)
n2 <- nrow(wasde_oz)
df1 <- n1 - 1
df2 <- n2 - 1# Calculate critical F values
f_value_lower <- qf(alpha_half, df1, df2)
f_value_upper <- qf(1 - alpha_half, df1, df2)
# Variance comparison result
if (f_calculated > f_value_lower & f_calculated < f_value_upper) {
cat("Fail to Reject H0: ", h0_variance, "\n")
equal_variances <- TRUE
} else {
cat("Reject H0: ", h1_variance, "\n")
equal_variances <- FALSE
}
the result is We reject the null hypothesis with a significance level of 5%., in other words, from this test we believe that the population variances of the two populations are not equal. Now we know why we should use the Welch t-test instead of the pooled variance t-test.
Part 4: The main course, Welch's t-test
The Welch t test, also called the Welch unequal variances t test, is a statistical method used to compare the means of two separate samples. Instead of assuming equal variances like the standard combined variances t test, Welch's t test is more robust since it does not make this assumption. This adjustment in degrees of freedom leads to a more precise evaluation of the difference between the two sample means. By not assuming equal variances, Welch's t-test offers a more reliable result when working with real-world data where this assumption may not be true. It is preferred for its adaptability and reliability, as it ensures that conclusions drawn from statistical analyzes remain valid even if the assumption of equal variances is not met.
The test statistic formula is:
where:
and the Degree of Freedom can be defined like this:
The rejection region for the Welch t test depends on the significance level chosen and whether the test is one-tailed or two-tailed.
Two-tailed test: The null hypothesis is rejected if the absolute value of the test statistic |t| is greater than the critical value of the t distribution with ν degrees of freedom at α/2.
one tail test: The null hypothesis is rejected if the test statistic t is greater than the critical value of the t distribution with ν degrees of freedom in α for an upper-tailed test, or if t is less than the negative critical value for a lower tail. tail test.
- Top tail test: t > ta,n
- Bottom tail test: t < −tα,n
So let's do an example with One-tailed Welch's t test.
Let's generate the hypotheses:
h0_mean <- "Population mean of Wheat production in Argentina equals that in Australia"
h1_mean <- "Population mean of Wheat production in Argentina is greater than that in Australia"
this is a top tail test, then the rejection region is: t > tα,ν
and using the formula given above, and using the same level of significance (0.05):
# Calculate sample means
sample_mean_argentina <- mean(wasde_argentina$value)
sample_mean_oz <- mean(wasde_oz$value)# Welch's t-test (unequal variances)
s1 <- sample_var_argentina
s2 <- sample_var_oz
t_calculated <- (sample_mean_argentina - sample_mean_oz) / sqrt(s1/n1 + s2/n2)
df <- (s1/n1 + s2/n2)^2 / ((s1^2/(n1^2 * (n1-1))) + (s2^2/(n2^2 * (n2-1))))
t_value <- qt(1 - alpha, df)
# Mean comparison result
if (t_calculated > t_value) {
cat("Reject H0: ", h1_mean, "\n")
} else {
cat("Fail to Reject H0: ", h0_mean, "\n")
}
the result is If we fail to reject H0 at a 5% level of significance, then the population mean of wheat production in Argentina is equal to that of Australia.
This is how the Welch t test is performed. Now it's you turn. Happy experimenting!
Part 5: Conclusion
When comparing two population means during hypothesis testing, it is very important to start by checking if the variances are equal. This initial step is crucial as it helps decide which statistical test to use, ensuring accurate and reliable results. If it turns out that the variances are indeed equal, you can go ahead and apply the standard t-test with pooled variances. However, in cases where the variances are not equal, it is recommended to use the Welch t test.
Welch's t test provides a robust solution for comparing means when the assumption of equal variances is not met. By adjusting the degrees of freedom to accommodate unequal variances, Welch's t test provides a more accurate and reliable assessment of the statistical significance of the difference between two sample means. This adaptability makes it a popular choice in various practical situations where sample sizes and variations can vary significantly.
In conclusion, checking for equality of variances and using Welch's t test when necessary ensures the accuracy of the hypothesis test. This approach reduces the chances of type I and type II errors, resulting in more reliable conclusions. By selecting the appropriate test based on equality of variances, we can confidently analyze the findings and make well-informed decisions based on empirical evidence.