What is the 80% Rule?

The 80% Rule (or Four-Fifths Rule) is a simple way to check for potential adverse impact, but it has limitations, especially when it comes to sample size. In small sample sizes, even a few hires or rejections can drastically change the selection rate, which can make the rule unreliable.
Why the 80% Rule Isn't Perfect
  1. Sample Size Sensitivity — In small samples, even tiny changes in the data can trigger the 80% rule, making it seem like there's adverse impact when the difference might just be due to chance.
  2. Over-Simplified — It's a rough estimate and doesn't provide statistical proof of discrimination; it's more of a rule of thumb than a precise measurement.
  3. Can Miss True Discrimination in Large Samples — In large datasets, the 80% rule might not catch subtle patterns of discrimination, as the differences between groups can be statistically significant even if the selection rate is close to 80%.
80% Rule by Decile
These results represent a hypothetical hiring procedure based on the assessment scores. The candidates are divided into deciles, or 10% groupings, based on their performance scores. This method helps to evaluate whether there is any adverse impact across different score ranges.
It's important to note that actual hires may not occur in strict top-down order based solely on these scores. Factors like interviews, experience, or other criteria may influence final selection decisions, meaning hires could come from various decile ranges. Therefore, this analysis provides an estimate of potential impact across score tiers rather than a definitive ranking of candidates.
Benefits of Testing by Decile
  1. Granularity — It allows you to see if adverse impact or disparities occur more strongly in certain performance ranges (e.g. low/middle/high performers) rather than across the board.
  2. Identify Specific Patterns — Testing by decile can reveal if certain groups perform significantly worse, which might be hidden in the overall analysis.
  3. Better Insights for Improvement — By seeing exactly where performance gaps exist, organizations can better target specific sections of the test or assessment for improvement.
  4. Reduced Data Skewing — Analyzing all data together might skew results if there are extreme scores (e.g., outliers), breaking it down allows for a fairer, more focused analysis.

What is ANOVA (Significance Testing)

ANOVA (Analysis of Variance) is a statistical method used to compare the means of three or more groups to see if there is a significant difference between them.
Between Groups: This variation measures the differences in mean scores between the different groups being compared (e.g., men vs. women). If there is a large between-group difference, it suggests that the groups may be performing differently on the assessment, which can be a sign of potential adverse impact.
Within Groups: This variation measures the differences within each group itself. It looks at how much individuals in the same group vary from each other. Lower within-group variation indicates that members within each group have similar scores, while higher within-group variation means the group is more diverse in performance.
Additionally, a large F-statistic indicates that the between-group variance is much larger than the within-group variance, suggesting that group means are significantly different.
What is a Z-Value?
A z-value tells you how many standard deviations a particular result is from the expected average.
Think of it like measuring how far away a specific result is from what is "normal" (average) in a group.
What is the Importance of 1.96?
If the z-value is greater than 1.96, it means this result is far enough from average that it's unlikely to be random.
In statistics, a z-value above 1.96 usually means there's less than a 5% chance the result happened just by luck, so it's considered "significant."
Interpreting Z-Values
If the z-value is high (positive), it means there are more occurrences than expected.
If the z-value is low (negative), it means there are fewer occurrences than expected.

When Hiring: If fewer people from a subgroup are selected (negative z-value), it might indicate a problem, like bias.
When Firing: If more people from a subgroup are terminated (positive z-value), that might also be concerning.

What is T Testing

Here's a funky little blurb about T-Testing! Lorem ipsum odor amet, consectetuer adipiscing elit. Suscipit dictum pulvinar himenaeos fermentum tempor mus facilisis curae iaculis. Vulputate litora vivamus proin at, aliquam inceptos pretium phasellus dolor. Praesent mauris pellentesque senectus senectus sollicitudin. Integer ac non tristique massa sodales bibendum. Curabitur etiam pellentesque scelerisque maximus, maecenas class eu amet etiam. Amet ligula magna fringilla felis montes cursus lacus laoreet.

What are Fisher's Exact Test & the Chi-Square Test

Statistical tests like Fisher's Exact Test and the Chi-Square Test are more reliable for detecting adverse impact across different sample sizes. Unlike the 80% rule, these tests provide more precise results for both small and large datasets, making them better suited for varying sample sizes.
Fisher's Exact Test (Best for sample sizes <100 Applicants)
  1. If the test gives a p-value less than 0.05, the difference may be meaningful and not random.
Chi-Square Test (Best for sample sizes >100 Applicants)
  1. If the test gives a p-value less than 0.05, the difference may be meaningful and not random.