Read the names of the columns from left to right that represent the names of the judges between Jimena Hoffner and Noelia Barsel you will see that:
- The 1st-5th and 11th-15th judges belong to what we will call panel 1.
- The judges from 6th to 10th and from 16th to 20th belong to what we will call panel 2.
Do you notice anything? Notice how the dancers who were judged by panel 2 appear in much greater proportion than the dancers who were judged by panel 1. If you scroll down the PDF of In this data table you will see that this proportional difference remains between the competitors who scored good enough to advance to the semi-final round.
Note: Dancers shaded GREEN advanced to the semi-final round. While the dancers who were NOT shaded green did not advance to the semi-final round.
So this begs the question, Is this proportional difference real or is it due to random sampling?Random assignment of dancers to one panel over the other? Well, there is a statistical test we can use to answer this question.
Two-tailed test of equality between two population proportions
We will use the two-tailed z test to test whether there is a significant difference between the two proportions in either direction. We are interested in knowing whether one proportion is significantly different from the other, regardless of whether it is larger or smaller.
Statistical Test Assumptions
- Random sampling: Samples must be drawn independently and randomly from their respective populations.
- Large sample size: Sample sizes should be large enough so that the sampling distribution of the difference in sample proportions is approximately normal. This approximation comes from the Central limit theorem.
- Expected number of successes and failures: To ensure that the normal approximation is maintained, the number of expected successes and failures in each group should be at least 5.
Our data set meets all of these assumptions.
Take the test
- Define our hypotheses
Null hypothesis: The proportions of each distribution are the same.
Alt. Hypothesis: The proportions of each distribution are NOT the same.
2. Choose a level of statistical significance
The default value for alpha is 0.05 (5%). We have no reason to relax this value (i.e. 10%) or make it stricter (i.e. 1%). So we will use the default value. Alpha represents our tolerance to false rejection of the Null Hyp. in favor of Alt. Hyp due to random sampling (i.e. type 1 error).
Next, we carry out the test using the Python code provided below.
def plot_two_tailed_test(z_value):
# Generate a range of x values
x = np.linspace(-4, 4, 1000)
# Get the standard normal distribution values for these x values
y = stats.norm.pdf(x)# Create the plot
plt.figure(figsize=(10, 6))
plt.plot(x, y, label='Standard Normal Distribution', color='black')
# Shade the areas in both tails with red
plt.fill_between(x, y, where=(x >= z_value), color='red', alpha=0.5, label='Right Tail Area')
plt.fill_between(x, y, where=(x <= -z_value), color='red', alpha=0.5, label='Left Tail Area')
# Define critical values for alpha = 0.05
alpha = 0.05
critical_value = stats.norm.ppf(1 - alpha / 2)
# Add vertical dashed blue lines for critical values
plt.axvline(critical_value, color='blue', linestyle='dashed', linewidth=1, label=f'Critical Value: {critical_value:.2f}')
plt.axvline(-critical_value, color='blue', linestyle='dashed', linewidth=1, label=f'Critical Value: {-critical_value:.2f}')
# Mark the z-value
plt.axvline(z_value, color='red', linestyle='dashed', linewidth=1, label=f'Z-Value: {z_value:.2f}')
# Add labels and title
plt.title('Two-Tailed Z-Test Visualization')
plt.xlabel('Z-Score')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True)
# Show plot
plt.savefig(f'../images/p-value_location_in_z_dist_z_test_proportionality.png')
plt.show()
def two_proportion_z_test(successes1, total1, successes2, total2):
"""
Perform a two-proportion z-test to check if two population proportions are significantly different.
Parameters:
- successes1: Number of successes in the first sample
- total1: Total number of observations in the first sample
- successes2: Number of successes in the second sample
- total2: Total number of observations in the second sample
Returns:
- z_value: The z-statistic
- p_value: The p-value of the test
"""
# Calculate sample proportions
p1 = successes1 / total1
p2 = successes2 / total2
# Combined proportion
p_combined = (successes1 + successes2) / (total1 + total2)
# Standard error
se = np.sqrt(p_combined * (1 - p_combined) * (1/total1 + 1/total2))
# Z-value
z_value = (p1 - p2) / se
# P-value for two-tailed test
p_value = 2 * (1 - stats.norm.cdf(np.abs(z_value)))
return z_value, p_value
min_score_for_semi_finals = 7.040
is_semi_finalist = df.PROMEDIO >= min_score_for_semi_finals
# Number of couples scored by panel 1 advancing to semi-finals
successes_1 = df(is_semi_finalist)(panel_1).dropna(axis=0).shape(0)
# Number of couples scored by panel 2 advancing to semi-finals
successes_2 = df(is_semi_finalist)(panel_2).dropna(axis=0).shape(0)
# Total number of couples that where scored by panel 1
n1 = df(panel_1).dropna(axis=0).shape(0)
# Total sample of couples that where scored by panel 2
n2 = df(panel_2).dropna(axis=0).shape(0)
# Perform the test
z_value, p_value = two_proportion_z_test(successes_1, n1, successes_2, n2)
# Print the results
print(f"Z-Value: {z_value:.4f}")
print(f"P-Value: {p_value:.4f}")
# Check significance at alpha = 0.05
alpha = 0.05
if p_value < alpha:
print("The difference between the two proportions is statistically significant.")
else:
print("The difference between the two proportions is not statistically significant.")
# Generate the plot
# P-Value: 0.0000
plot_two_tailed_test(z_value)
The graph shows that the calculated Z value exists well outside the range of z values we would expect to see if the null hypothesis is true. Thus resulting in a p value of 0.0 which indicates that we must reject the null hypothesis in favor of the alternative.
This means that the differences in proportions are real and are not due to random sampling.
- 17% of the dance couples judged by panel 1 advanced to the semifinals
- 42% of the dance couples judged by panel 2 advanced to the semifinals
Our first statistical test of bias has provided evidence that there is a positive bias in the scores of the dancers evaluated by panel 2, representing a nearly two-fold increase.
Next, we dive into each individual judge's score distributions and see how their individual biases affect the overall bias of their panel.