ML:《Interpretability Methods in Machine Learning: A Brief Survey—机器学习中的可解释性方法的简要综述》翻译与解读

时间:2022-11-16 15:00:03


ML:《Interpretability Methods in Machine Learning: A Brief Survey—机器学习中的可解释性方法的简要综述》翻译与解读

目录

​​《Interpretability Methods in Machine Learning: A Brief Survey》翻译与解读​​

​​Interpretable ML models and “Black Boxes”​​

​​Model-agnostic interpretability methods​​

​​Problem​​

​​Method 1: Partial Dependence Plot (PDP)​​

​​Method 2: Individual Conditional Expectation (ICE)​​

​​Method 3: Permuted Feature Importance​​

​​PDP vs. ICE vs. Feature Importance​​

​​Method 4: Global Surrogate​​

​​Method 5: Local Surrogate (LIME)​​

​​Global vs. Local Surrogate Methods​​


《Interpretability Methods in Machine Learning: A Brief Survey》翻译与解读

作者

AI engineer Xiang Zhou,Two Sigma

原文地址

​Interpretability Methods in Machine Learning: A Brief Survey - Two Sigma​

时间


Two Sigma AI engineer Xiang Zhou outlines several approaches for understanding how machine learning models arrive at the answers they do.

两位 Sigma AI 工程师周翔概述了几种理解机器学习模型如何得出答案的方法。

Machine learning (ML) models can be astonishingly good at making predictions, but they often can’t yield  explanations for their forecasts in terms that humans can easily understand. The features from which they draw conclusions can be so numerous, and their calculations so complex, that researchers can find it impossible to establish exactly why an algorithm produces the answers it does.

It is possible, however, to determine how a machine learning algorithm arrived at its conclusions.

This ability, otherwise known as “interpretability,” is a very active area of investigation among AI researchers in both academia and industry. It differs slightly from “explainability”–answering why–in that it can reveal causes and effects of changes within a model, even if the model’s internal workings remain opaque.

机器学习(ML)模型在做出预测方面的能力惊人,但它们通常不能以人类容易理解的方式对其预测做出解释。 他们得出结论的特征可能如此之多,计算如此复杂,以至于研究人员无法准确确定算法产生其答案的确切原因。

但是,去确定,机器学习算法是如何得出结论,这是可以实现的。

这种能力,也称为“可解释性”,这也是学术界和工业界人工智能研究人员非常活跃的研究领域。 它与“可解释性”(回答为什么)略有不同,因为它可以揭示模型内变化的原因和影响,即使模型的内部运作仍然不透明。

Interpretability is crucial for several reasons. If researchers don’t understand how a model works, they can have difficulty transferring learnings into a broader knowledge base, for example. Similarly, interpretability is essential for guarding against embedded bias or debugging an algorithm. It also helps researchers measure the effects of trade-offs in a model. More broadly, as algorithms play an increasingly important role in society, understanding precisely how they come up with their answers will only become more critical.

Researchers currently must compensate for incomplete interpretability with judgement, experience, observation, monitoring, and diligent risk management–including a thorough understanding of the datasets they use. However, several techniques exist for enhancing the degree of interpretability in machine learning models, regardless of their type. This article summarizes several of the most common of these, including their relative advantages and disadvantages.

基于以下几个原因,可解释性是至关重要的。 例如,如果研究人员不了解模型的工作原理,他们可能难以将学习成果转化为更广泛的知识库。 同样,可解释性对于防止嵌入偏差或调试算法至关重要。 它还可以帮助研究人员衡量模型中trade-offs的影响。更广泛地说,随着算法在社会中扮演着越来越重要的角色,准确理解它们是如何得出答案的只会变得更加关键。

目前,研究人员必须通过判断、经验、观察、监控和勤勉的风险管理来弥补不完整的可解释性——包括对他们使用的数据集的彻底理解。 然而,有几种技术可以提高机器学习模型的可解释性,不管它们是什么类型的。 本文总结了其中最常见的几种,包括它们的相对优点和缺点。

Interpretable ML models and “Black Boxes”

Some machine learning models are interpretable by themselves. For example, for a linear model, the predicted outcome Y is a weighted sum of its features X. You can visualize “y equals a X plus b” in a plot as a straight line: a, the feature weight, is the slope of the line, and b is the intercept of the y-axis.

有一些机器学习模型可以自行解释。例如,对于线性模型,预测结果 Y 是其特征 X 的加权和。您可以将“Y = a X + b”在图中可视化为一条直线:a,特征权重,是直线的斜率,b是Y轴的截距。

Linear models are user-friendly because they are simple and easy to understand. However, achieving the highest accuracy for large modern datasets often requires more complex and expressive models, like neural networks.

The following image shows a small, fully connected neural network, one of the simplest neural architectures. But even for this simplest neural architecture, there’s no way for anyone to understand which neuron is playing what role, and which input feature actually contributes to the model output. For this reason, such models are sometimes called “black boxes.”

Now imagine a model with millions of neurons, and all sorts of connections. Without robust interpretability techniques, it would be difficult for a researcher to understand it at all.

线性模型是用户友好的,因为它们简单易懂。然而,为大型现代数据集实现最高精度通常需要更复杂和更具表现力的模型,例如神经网络。

下图展示了一个小型的全连接神经网络,它是最简单的神经结构之一。但即使对于这种最简单的神经结构,也没有人能理解哪个神经元在扮演什么角色,哪个输入特征实际上对模型输出有贡献。因此,此类模型有时被称为“黑匣子”。

现在想象一个具有数百万个神经元和各种连接的模型。如果没有强大的可解释性技术,研究人员将很难理解它。

Model-agnostic interpretability methods

Several important model-agnostic interpretability methods exist, and while none of them are perfect, they can help researchers interpret the results of even very complex ML models.

For demonstration purposes, let’s consider a small time-series dataset. A time series is simply a series of data points indexed in time order. It is the most common type of data in the financial industry. A frequent goal of quantitative research is to identify trends, seasonal variations, and correlation in financial time series data using statistical and machine learning methods.

存在几种重要的与模型无关的可解释性方法,虽然它们都不是完美的,但它们可以帮助研究人员解释甚至非常复杂的 ML 模型的结果。

出于演示目的,让我们考虑一个小型时间序列数据集。 时间序列只是按时间顺序索引的一系列数据点。 它是金融行业中最常见的数据类型。 定量研究的一个常见目标是使用统计和机器学习方法识别金融时间序列数据中的趋势、季节变化和相关性。

Problem

Data:

Time series data (X, y)

Model:

model = RandomForestRegressor (n_estimators=10, max_depth=3)

model.fit (X, y)

Prediction:

ŷ = model.predict (X)

The model used in this example is a RandomForestRegressor from sklearn.

时间序列数据(X, y)

Method 1: Partial Dependence Plot (PDP)

The first method we’ll examine is Partial Dependence Plot or PDP, which was invented decades ago, and shows the marginal effect that one or two features have on the predicted outcome of a machine learning model.

It helps researchers determine what happens to model predictions as various features are adjusted.

我们要研究的第一种方法是几十年前发明的局部依赖图或 PDP,它显示了一个或两个特征对机器学习模型的预测结果的边际效应。

它帮助研究人员确定随着各种特征的调整,模型预测会发生什么。

Here in this plot, the x-axis represents the value of feature f0, and the y-axis represents the predicted value. The solid line in the shaded area shows how the average prediction varies as the value of f0 changes.

PDP is very intuitive and easy to implement, but because it only shows the average marginal effects, heterogeneous effects might be hidden.1 For example, one feature might show a positive relationship with prediction for half of the data, but a negative relationship for the other half. The plot of the PDP will simply be a horizontal line.

To solve this problem, a new method was developed.

在此图中,x 轴表示特征 f0 的值,y 轴表示预测值。 阴影区域中的实线显示了平均预测如何随着 f0 值的变化而变化。

PDP 非常直观且易于实现,但由于它只显示平均边际效应,导致异构效应可能被隐藏。例如,一个特征可能与一半数据的预测呈现正相关关系,但与另一半数据的预测呈现负相关关系。 PDP 的图将只是一条水平线。

为了解决这个问题,开发了一种新方法。

Method 2: Individual Conditional Expectation (ICE)

Individual Conditional Expectation or ICE, is very similar to PDP, but instead of plotting an average, ICE displays one line per instance.

This method is more intuitive than PDP because each line represents the predictions for one instance if one varies the feature of interest.

Like partial dependence, ICE helps explain what happens to the predictions of the model as a particular feature varies.

ICE displays one line per instance:

Individual Conditional Expectation单个条件期望,即ICE,它与 PDP 非常相似,但是 ICE不是绘制平均值,而是每个实例显示一条线。

这种方法比 PDP 更直观,因为如果一个实例改变了感兴趣的特征,每一行代表一个实例的预测。

与局部依赖一样,ICE 有助于解释随着特定特征的变化,模型的预测会发生什么变化。

ICE 每个实例显示一行:

Unlike PDP, ICE curves can uncover heterogeneous relationships. However, this benefit also comes with a cost: it might not be as easy to see the average effect as it is with PDP.

与 PDP 不同,ICE曲线可以揭示异构关系。然而,这种好处也伴随着成本:它可能不像使用 PDP 那样容易看到平均效果。

Method 3: Permuted Feature Importance

Permuted Feature Importance is another traditional interpretability method.

The importance of a feature is the increase in the model prediction error after the feature’s values are shuffled. In other words, it helps define how the features in a model contribute to the predictions it makes.


In the plot below, the x-axis represents the score reduction, or model error, and the y-axis represent each feature f0, f1, f2, f3.

置换特征重要性是另一种传统的可解释性方法。

特征的重要性在于对特征值进行混洗后模型预测误差的增加。 换句话说,它有助于定义模型中的特征如何对其预测所做出的贡献。


在下图中,x 轴表示分数降低,或模型误差,y 轴表示每个特征 f0、f1、f2、f3。

As the plot shows, feature f2, the feature on top, has the largest impact on the model error; while f1, the second feature from the top, has no impact on the error after the shuffling. The remaining two features have negative contributions to the model.

如图所示,最上面的特征 f2 对模型误差的影响最大; 而从上数第二个特征 f1 对洗牌后的误差没有影响。 其余两个特征对模型有负面贡献。

PDP vs. ICE vs. Feature Importance

All three of the methods above are intuitive and easy to implement.


PDP shows global effects, while hiding heterogeneous effects. ICE can uncover heterogeneous effects, but makes it hard to see the average.


Feature importance provides a concise way to understand the model’s behavior. The use of error ratio (instead of the error) makes the measurements comparable across different problems. And it automatically takes into account all interactions with other features.

上述所有三种方法都直观且易于实现。


PDP 显示全局效应,同时隐藏异构效应。 ICE 可以发现异构效应,但很难看到平均值。


特征重要性提供了一种简洁的方式来理解模型的行为。 使用误差率(而不是误差)使得不同问题之间的测量具有可比性。 它会自动考虑与其他特征的所有交互。

However, the interactions are not additive. Adding up feature importance does not result in a total drop in performance. Shuffling the features adds randomness, so the results may be different each time. Also, the shuffling requires access to true outcomes, which is impossible for many scenarios.


Besides, all three methods assume the independence of the features, so if features are correlated, unlikely data points will be created and the interpretation can be biased by these unrealistic data points.

然而,相互作用不是叠加的。 将特征重要性叠加不会导致性能完全下降。 对特征进行洗牌会增加随机性,因此每次的结果都可能不同。 此外,洗牌需要获得真实的结果,这在许多情况下是不可能的。

此外,这三种方法都假设特征的独立性,因此如果特征相关,就会产生不太可能的数据点,并且这些不切实际的数据点可能会导致解释产生偏差。

Method 4: Global Surrogate

The global surrogate method takes a different approach. In this case, an interpretable model is trained to approximate the prediction of a black box model.


The process is simple. First you get predictions on a dataset with the trained black box model, and then train an interpretable model on this dataset and predictions. The trained interpretable model now becomes a surrogate of the original model, and all we need to do is to interpret the surrogate model. Note, the surrogate model could be any interpretable model: linear model, decision tree, human defined rules, etc.

全局代理方法采用不同的方法。 在这种情况下,训练一个可解释的模型来近似黑盒模型的预测。


这个过程很简单。首先,使用经过训练的黑盒模型对数据集进行预测,然后在该数据集和预测上训练可解释的模型。 训练好的可解释模型现在成为原始模型的代理,我们需要做的就是解释代理模型。 请注意,代理模型可以是任何可解释的模型:线性模型、决策树、人为定义的规则等。

Using an interpretable model to approximate the black box model introduces additional error, but the additional error can easily be measured by R-squared.


However, since the surrogate models are only trained on the predictions of the black box model instead of the real outcome, global surrogate models can only interpret the black box model, but not the data.

使用可解释的模型来近似黑盒模型会引入额外的误差,但额外的误差可以很容易地通过R2来测量。


但是,由于代理模型仅根据黑盒模型的预测而不是真实结果进行训练,因此全局代理模型只能解释黑盒模型,而不能解释数据。

Method 5: Local Surrogate (LIME)

Local Surrogate, or LIME (for Local Interpretable Model-agnostic Explanations), is different from global surrogate, in that it does not try to explain the whole model. Instead, it trains interpretable models to approximate the individual predictions.


LIME tries to understand how the predictions change when we perturb the data samples. Here is an example of LIME explaining why this picture is classified as a tree frog by the model.

局部代理,或 LIME(用于局部可解释、与模型无关的解释)与全局代理不同,因为它不试图解释整个模型。 相反,它训练可解释的模型来近似单个预测。


LIME 试图了解当我们打乱数据样本时预测如何变化。 下面是一个 LIME 示例,解释了为什么这张图片被模型归类为树蛙。

First the image on the left is divided into interpretable components. LIME then generates a dataset of perturbed instances by turning some of the interpretable components “off” (in this case, making them gray).


For each perturbed instance, one can use the trained model to get the probability that a tree frog is in the image, and then learn a locally weighted linear model on this dataset.


In the end, the components with the highest positive weights are presented as an explanation.

首先,左侧的图像被划分为可解释的组件。然后,LIME通过“关闭”一些可解释的组件(在这种情况下,将它们设为灰色)来生成一个受干扰实例的数据集。


对于每一个受扰动的实例,可以使用训练好的模型来获得树蛙出现在图像中的概率,然后在这个数据集上学习一个局部加权的线性模型。


最后,给出具有最高正权重的组件作为解释。

Global vs. Local Surrogate Methods

Both the global and local surrogate methods have advantages and disadvantages.


Global surrogate cares about explaining the whole logic of the model, while local surrogate is only interested in understanding specific predictions.


With the global surrogate method, any interpretable model can be used as surrogate, and the closeness of the surrogate models to the black box models can easily be measured.


However, since the surrogate models are trained only on the predictions of the black box model instead of the real outcome, they can only interpret the model, and not the data. Besides, the surrogate models, which are simpler than the black box model in a lot of cases, may only be able to give good explanations to part of the data, instead of the entire dataset.

全局局部代理方法各有优缺点。


全局代理关心的是解释模型的整个逻辑,而局部代理只关心理解特定的预测。


使用全局代理方法,可以使用任何可解释的模型作为代理,并且可以很容易地度量代理模型与黑箱模型的接近程度。


但是,由于代理模型仅根据黑盒模型的预测而不是真实结果进行训练,因此它们只能解释模型,而不能解释数据。此外,代理模型在很多情况下比黑箱模型更简单,只能很好地解释部分数据,而不是整个数据集。

The local surrogate method, on the other hand, does not share these shortcomings. In addition, the local surrogate method is model-agnostic: If you need to try a different black box model for your problem, you can still use the same surrogate models for interpretations. And compared with interpretations given by global surrogate methods, the interpretations from local surrogate methods are often short, contrastive, and human-friendly.


However, local surrogate has its own issues.


First, LIME uses a kernel to define the area within which data points are considered for local explanations, but it is difficult to find the proper kernel setting for a task. The way sampling is done in LIME can lead to unrealistic data points, and the local interpretation can be biased towards those data points.


Another concern is the instability of the explanations. Two very close points could lead to two very different explanations.

另一方面,局部代理方法没有这些缺点。此外,局部代理方法与模型无关:如果您需要为您的问题尝试不同的黑盒模型,您仍然可以使用相同的代理模型进行解释。并且与全局代理方法给出的解释相比,局部代理方法的解释往往简短、对比鲜明、人性化。


但是,局部代理有其自身的问题。


首先,LIME使用一个内核来定义数据点被考虑到局部解释的区域,但是很难找到适合任务的内核设置。在 LIME 中进行抽样的方式可能会导致不切实际的数据点,并且局部解释可能会偏向这些数据点。


另一个问题是解释的不稳定性。两个非常接近的点可能导致两种截然不同的解释。