Equality of Outcome metrics

The idea of equality of outcome metrics, is to compare the output of the model across groups. we call y^i\hat y_ithe model prediction for a sample i. We also define the average score yˉg\bar y_gas the sample mean of the outputs for that specific group: yˉg=1Ngiy^i,\bar y_g=\frac{1}{N_g}\sum_i \hat{y}_i, where NgN_g is the number of individuals in the group We will use here min and maj respectively to indicate the unprivileged and privileged groups.

  • Average Score Spread. This is the difference between the average score of the unprivileged and privileged group. The ideal value is 0, a value < 0 disadvantages the unprivileged group and > 0 is favorable.

AVS=yˉminyˉmajAVS=\bar y_{min}-\bar y_{maj}
  • Z Score Spread. The Z score spread is the average score spread divided by the pooled standard deviation. It allows us to compare the difference in average scores with the standard deviation. The ideal value is 0, a value less than 0 disadvantages the unprivileged group and larger than 0 is favorable.

ZSS=yˉminyˉmaxpoolSTDZSS=\frac{ \bar y_{min}- \bar y_{max}}{poolSTD}

The Average Score Spread and Z Score Spread can be calculated for the whole population or just the top 20% subjects.

In a selection context:

  • Disparate Impact Quantile90/80/50. If the regression outputs a score, which is then used for a success/failure decision, we can find the threshold for which the model passes a given percentage of the population (the quantile q). For example, the 90% quantile 90q is the threshold for while 90% of the sample data is successful. We can then calculate the disparate impact for a given quantile:

DisparateImpactQuant=1nmini(y^imin>q)1nmaji(y^imaj>q)DisparateImpactQuant=\frac{\frac{1}{n_{min}}\sum_i(\hat y_i^{min}>q)}{\frac{1}{n_{maj}}\sum_i(\hat y_i^{maj}>q)}

  • No adverse Impact. If we calculate the adverse impact for each possible quantile, we can find the minimum/maximum quantile for which the algorithm is considered unbiased (i.e. the disparate impact falls between 0.8 and 1.2).

NoAdverseImpact=minqQ(q)s.t.1nmini(y^imin>q)1nmaji(y^imaj>q)(0.8,1.2)No Adverse Impact= \min_{q \in Q}{(q)} \quad \text{s.t.} \quad\frac{\frac{1}{n_{min}}\sum_i(\hat y_i^{min}>q)}{\frac{1}{n_{maj}}\sum_i(\hat y_i^{maj}>q)} \in (0.8, 1.2)

  • Adverse Impact AUC. If we calculate the adverse impact for each possible threshold, we can create a curve that expresses the success rate of the majority group as a function of the success rate of the minority group. If the algorithm is fair, the rate will be exactly the same for both groups, meaning that the area under the curve (AUC) would be 0.5. If the AUC is larger than 0.5, the majority group is favored, if the AUC is smaller than 0.5, then the minority group is favored.

An example of how to measure bias in a regression problem in recruitment can be found in our notebook, which can be accessed here or downloaded as the following file:

Last updated