I was attending EMNLP 2020 this year. It is an interesting experience attending such an academic conference virtually.
For me, one of the most exciting thing about EMNLP 2020 is the track called "Interpretability and Analysis of Models for NLP." According to the opening remarks, this track was introduced earlier in ACL 2020. And the submissions in this track almost doubled in EMNLP 2020. The following lists a few papers related to explainability that I found interesting and inspiring. However, this is nothing close to a complete list of all the good papers related to interpretability and explainability. The paper summaries below are my own understanding and opinion. They may be accurate :)
Authors: Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, Isabelle Augenstein
The paper compiles a list of diagnostic properties with automatic measurement
- Human Agreement. Degree of overlap between human-annotated saliency scores and the scores computed by an explainability technique \(E\) for a given model \(M\). Mean AP is used for measuring human agreement.
- Confidence Indication. A measure of the predictive power of a produced explanations for the confidence of a model on its predictions. First, the saliency distance (distance between the saliency scores of a predicted class \(k\) and the other classes \(K/k\)). The confidence score is regressed with linear regression to measure how well a model's confidence can be indicated from saliency values. The lower the mean-absolute error (MAE), the better the confidence indication is.
- Faithfulness. Indicate how well the explanation aligns with the model's inner workings. Measured by masking the most salient words as per an explainability's saliency scores.
- Rationale Consistency. Similarity between the explanations v.s. similarity between the reasoning paths.
- Dataset Consistency. Similarity between the explanations v.s. similarity between the data points.
- Time for Computing.
Personally, I am more interested in human agreement, confidence indication, and faithfulness. The two consistency metrics relies highly on the similarity measures we choose.
The paper evaluates a set of models (CNN, LSTM, BERT) with different explainability techniques: * gradient-based: Saliency, gradient, guided backprop * perturbation-based: Occlusion, Shapley Value Sampling * simplification based: LIME
In general, gradient-based methods perform the best for both human agreement and faithfulness. While LIME and perturbation-based methods (e.g., ShapSampl) have the highest scores on confidence indication.
Authors: Nicola De Cao, Michael Sejr Schlichtkrull, Wilker Aziz, Ivan Titov
Objective: explaining model predictions and internal mechanisms.
Erasure-based explanation methods are model-agnostic and popular. However, they are often intractable due to the combinatorial computation complexity. The paper proposes a differentiable mask method to extract the faithful input attributions given the hidden states.
The core idea of the method is to train a shallow interpretation mask model (or probe). The input of the masking model is the hidden states of up to l, and the output is a binary mask vector over the input x.
To produce concise explanations (i.e., minimize non-zero masks), uses \(L_0\) loss to promote sparsity and constrain the margin between model outputs from original input and masked input. Since this non-linear constrained optimization is intractable, the authors used Lagrangian relaxation.
To overcome the challenge where L0 is discontinuous with zero derivative almost everywhere, the paper proposed to use stochastic masks and optimize the objective in expectation.
A sparse relaxation to binary variables is used (i.e., the hard concrete distribution, a mixed discrete-continuous distribution in the closed interval [0, 1]).
The paper conducted experiments on a toy task to validate the correctness of the method. The paper also applies the method to question answering and sentiment classification.
Erasure-based explanation is not new. The idea of probing is not new either. However, this paper presents a clean and clever solution to make probing-based masking differentiable and tractable. The results also looks decent.
One minor shortcoming of this method is that it produces binary explanations. From human's perception, we would naturally apply weights when explaining things (i.e., part of the inputs would have higher importance than the rest).
Authors: Jinyue Feng, Chantal Shaib, Frank Rudzicz
This paper presents a practical solutions for explainable text classification in clinical decision making.
The explanations are generated using the latent attention at the patient-level (i.e., the sentence-level). This make the explanations much more usable in practice than token-level ones.
Authors: Hendrik Schuff, Heike Adel, Ngoc Thang Vu
A paper addressing the limitations of current explainable QA methods, including: * Silent facts. Facts are used by the model for the prediction but are not included in the explanation. * Unused facts. Facts are presented in the explanations but are not relevant to the prediction.
The paper proposed a "Select and Forget" architecture, which basically forces the model to remove unused facts and only make predictions over related facts.
Authors: Hanjie Chen, Yangfeng Ji
Authors: Yao-Hung Hubert Tsai, Martin Ma, Muqiao Yang, Ruslan Salakhutdinov, Louis-Philippe Morency
Uses capsule networks to to produce dynamically adjusted weights for different modals (i.e, audio, text, vision).
Encoding stage encodes inputs from multi-modal inputs to multimodal features. The routing stage iterates between routing adjustment and concepts update.