通用性能评估

以下内容仅对二分类有效

Precision, Recall and R

对于任何机器学习所产出的结果，都可以列出下述 $2\times 2$ 列联表（分类结果的混淆矩阵）

	预测为真	预测为假	总数
事实为真	True Positive	False Negative	TP+FN
事实为假	False Positive	True Negative	FP+TN
总数	TP+FP	FN+TN	N=TP+FN+FP+TN

查准率 (准确率) $precision\ P = P(事实为真|预测为真) =\frac{TP}{TP+FP}$
在所有预测为真的样本中，有多少是事实为真的。P越高说明不错

查全率 (召回率) $recall \ R = P(预测为真|事实为真) \frac{TP}{TP+FN}$
在所有事实为真的样本中，有多少被预测为真。R越高说明不漏

绘制P-R图像

我们通常通过得分（置信度）来判断分类。因此对得分排序，高者为“预测为真”，低者为“预测为假”。
排序时，升序降序均可，但前后需保持一致。
每个样本进入统计，得到列联表，计算P和R，从而画出P-R图像。

平衡点 Break-Even Point, BEP

显然的，不错和不漏相互制衡，即通常为负相关。当 $P=R$ 时为最naive的最优解。

$F_{\beta}$ 度量

F_{\beta} = \frac{(1+\beta^2)\times P\times R}{(\beta^2\times P)+R}

特别的，

F_1=\frac{2\times P\times R}{P+R} = \frac{2\times TP}{N+TP-TN}

其中 $\beta(>0)$ 度量了R与P的相对重要性。以1为界， $\beta$ 越大查全率 $R$ 影响越大，反之查准率 $P$ 影响越大

特别的，

\frac{1}{F_1} = \frac{1}{2}(\frac{1}{P}+\frac{1}{R})\ 调和平均

\frac{1}{F_{\beta}} = \frac{1}{1+\beta^2}(\frac{1}{P}+\frac{\beta^2}{R})\ 加权调和平均

$F_{\beta}\in [0, 1]$ , 且越接近1表示两者平衡情况越好，学习的性能越好。

ROC与AUC

受试者工作特征 Receiver Operating Characteristic, ROC
真正例率 True Positive Rate $TPR = \frac{TP}{TP+FN} \approx p(\hat y = 1| y=1)$
假正例率 False Positive Rate $FPR = \frac{FP}{FP+TN} \approx p(\hat y = 1| y = 0)$
ROC是以其分别为纵横坐标而成的图像

Area Under ROC Curve, AUC: 下部的面积。若拟和效果非常出众，TPR=1, FPR=0，面积应为1，因此AUC越大拟和效果越好。

特别的，给定 $m^+$ 个正例和 $m^-$ 个反例， $D^+$ 与 $D^-$ 分别表示正例和反例集合，定义排序损失 $l_{rank}$

l_{rank} = \frac{1}{m^+m^-}\Sigma_{x^+\in D^+}\Sigma_{x^-\in D^-}(I(f(x^+)<f(x^-))+\frac{1}{2}I(f(x^+)==f(x^-)))

AUC = 1-l_{rank}

FAR, FRR, EER

False Acceptance Rate假接收率，没病当有病

FAR = \frac{FP}{FP+TN} = FPR

False Rejected Rate假拒绝率，有病当没病

FRR = \frac{FN}{TP+FN} = 1- TPR

far/fpr-threshold
（这张图的标注有误，FAR=FPR，应该是FAR和FRR，且FAR是橙色的）
对分类问题来说，两个率都应当是越小越好，从而EER越小越好；但是对攻击类来说，越大说明攻击的越成功，相应的EER也是越大越好。
横坐标为threshold，是一组0-1之间的等差数列，作为识别模型的判别界限。图中两条图线的交点对应的横坐标即为EER。EER并没有实际意义，但是EER越低表示模型效果越好。在工程上，我们一般将EER与系统判别的threshold划等号，但对于threshold不是0-1的情况，也可能将threshold映射回到EER上。
ROC and EER
从另一个角度看，FAR-FRR交点等价于求解 $FRR = 1-TPR$ ，换而言之，这是ROC曲线与 $FRR = 1-TPR$ 的交点。

红线A与蓝线B分别表示了两个不同分类器的TPR-FPR曲线（即ROC曲线）。曲线上任意一点都对应了一个threshold $\theta$ 。该曲线具有如下特征：

一定经过 (1, 1)，此时 FN=TN=0，全部“预测为真”，此时 $\theta = 0$
一定经过 (0, 0)，此时 TP=FP=0，没有“预测为真”的样本，此时 $\theta = 1$
最完美的分类器（完全区分正负样本）：(0, 1)点，即TPR=1, FPR=0，全部判断正确
曲线越向左上角凸起，分类效果越好，对应越小的 $\theta$

Code Implementation

def compute_det_curve(target_scores, nontarget_scores): #Detection Error Tradeoff
    '''
    Parameters:
    target_scores: y hat = 1
    nontarget_scores: y hat = 0
    '''

    n_scores = target_scores.size + nontarget_scores.size
    all_scores = np.concatenate((target_scores, nontarget_scores))
    labels = np.concatenate((np.ones(target_scores.size), np.zeros(nontarget_scores.size)))

    # Sort labels based on scores
    indices = np.argsort(all_scores, kind='mergesort')
    labels = labels[indices]

    # Compute false rejection and false acceptance rates
    tar_trial_sums = np.cumsum(labels)
    nontarget_trial_sums = nontarget_scores.size - (np.arange(1, n_scores + 1) - tar_trial_sums)

    # false rejection rates
    frr = np.concatenate((np.atleast_1d(0), tar_trial_sums / target_scores.size)) # 在最前面加一个0，FRR的初始值为0
    far = np.concatenate((np.atleast_1d(1), nontarget_trial_sums / nontarget_scores.size))  # false acceptance rates
    # Thresholds are the sorted scores
    thresholds = np.concatenate((np.atleast_1d(all_scores[indices[0]] - 0.001), all_scores[indices])) # 加入一个理想中的score，其他是对应的threshold
    # 这里的threshold (EER) 就是判断过程中的score，也就是二分类的threshold
    # threshold并不是直接意义上的横坐标，也是通过转换和对应关系得到的。
    # 虽然我觉得这里增加一项并不会影响结果

    return frr, far, thresholds


def compute_eer(target_scores, nontarget_scores):
    """ Returns equal error rate (EER) and the corresponding threshold. """
    frr, far, thresholds = compute_det_curve(target_scores, nontarget_scores)
    abs_diffs = np.abs(frr - far)
    min_index = np.argmin(abs_diffs)
    eer = np.mean((frr[min_index], far[min_index]))
    return eer, thresholds[min_index]

It's the code copied from AASIST

References

https://blog.csdn.net/qq_18888869/article/details/84848689
https://blog.csdn.net/qq_18888869/article/details/84942224
https://www.cnblogs.com/xfzhang/p/4788227.html
https://blog.csdn.net/qq_37977007/article/details/135736055

Linyong GAN

小专题：机器学习性能度量，从一般到特殊

通用性能评估

Precision, Recall and R

绘制P-R图像

平衡点 Break-Even Point, BEP

$F_{\beta}$ 度量

ROC与AUC

FAR, FRR, EER

Code Implementation

References

About

Linyong GAN

小专题：机器学习性能度量，从一般到特殊

通用性能评估

Precision, Recall and R

绘制P-R图像

平衡点 Break-Even Point, BEP

FβF_{\beta}Fβ​ 度量

ROC与AUC

FAR, FRR, EER

Code Implementation

References

About

$F_{\beta}$ 度量