Comparative assessment of interpretability methods of deep activity models for hERG
Sprache des Titels:
19th International Workshop on (Q)SAR in Environmental and Health Sciences (QSAR2021), Poster Session, June 2021, online
Since many highly accurate predictive models for bioactivity and toxicity assays are based on Deep Learning methods, there has been a recent surge of interest in interpretability methods for Deep Learning approaches in drug discovery [1,2]. Interpretability methods are highly desired by human experts to enable them to make design decisions on the molecule based on the activity model. However, it is still unclear which of those interpretability methods are better identifying relevant substructures of molecules. A method comparison is further complicated by the lack of ground truth and appropriate metrics. Here, we present the first comparative study of a set of interpretability methods for Deep Learning models for hERG inhibition. In our work, we compared layer-wise relevance propagation, feature gradients, saliency maps, integrated gradients, occlusion and Shapley values. In the quantitative analysis, known substructures which indicate hERG activity are used as ground truth . Interpretability methods were compared by their ability to rank atoms, which are part of indicative substructures, first. The significantly best performing method is Shapley values with an area under-ROC-curve (AUC) of ~0.74 ± 0.12, but also runner-up methods, such as Integrated Gradients, achieved similar results. The results indicate that interpretability methods for deep activity models have the potential to identify new toxicophores.