DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [AAAI'24]

1Zhejiang University, Hangzhou, China 2Youtu Lab, Tencent, Shanghai, China

Abstract

Reconstruction-based approaches have achieved remarkable outcomes in anomaly detection. The exceptional image reconstruction capabilities of recently popular diffusion models have sparked research efforts to utilize them for enhanced reconstruction of anomalous images. Nonetheless, these methods might face challenges related to the preservation of image categories and pixel-wise structural integrity in the more practical multi-class setting. To solve the above problems, we propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection, which consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion’s denoising network, and a feature-space pre-trained feature extractor. Firstly, The SG network is proposed for reconstructing anomalous regions while preserving the original image’s semantic information. Secondly, we introduce Spatial-aware Feature Fusion (SFF) block to maximize reconstruction accuracy when dealing with extensively reconstructed areas. Thirdly, the input and reconstructed images are processed by a pre-trained feature extractor to generate anomaly maps based on features extracted at different scales. Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach which surpasses the state-of-the-art methods, e.g., achieving 96.8/52.6 and 97.2/99.0 (AUROC/AP) for localization and detection respectively on multi-class MVTec-AD dataset.

Interpolate start reference image.

Qualitative Results

MVTec-AD Dataset

We conducted substantial qualitative experiments on MVTec-AD and VisA datasets to visually demonstrate the superiority of our method in image reconstruction and the accuracy of anomaly localization. As shown in Figure 4, our method exhibits better reconstruction capabilities for anomalous regions compared to the EdgRec on MVTec-AD dataset. In comparison to UniAD shown in Figure 5, our method exhibits more accurate anomaly localization abilities on VisA dataset.

MVTec-AD dataset demo1.
MVTec-AD dataset demo2.

VisA Dataset

Interpolate start reference image.

Quantitative Results

MVTec-AD Dataset

As shown in Table 1 and in Table 3, our method achieves SOTA AUROC/AP/F1max metrics of 97.2/99.0/96.5 and 96.8/52.6/55.5 for image-wise and pixel-wise respectively for multi-class setting on MVTec- AD dataset. For the diffusion-based methods, our approach significantly outperforms existing DDPM and LDM methods in terms of 11.7↑ in AUROC and 25↑ in AP for anomaly localization. For non-diffusion methods, our approach surpasses existing methods in both metrics, especially at the pixel level, where our method exceeds UniAD by 9.2↑/6.0↑ in AP/F1max.

Interpolate start reference image.
Interpolate start reference image.

VisA Dataset

Our method has also demonstrated its superiority on VisA dataset, as shown in Table 2. Our approach exhibits significant improvements compared to diffusion-based methods of 30.1↑/9.4↑ than the LDM method in image/pixel AUROC. It also performs well compared to UniAD by 4.9↑/6.0↑ in pixel AP/F1max metrics. Detailed experiments for each category are provided in Appendix.

Interpolate start reference image.

BibTeX

@misc{he2023diad,
      title={DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection},
      author={Haoyang He and Jiangning Zhang and Hongxu Chen and Xuhai Chen and Zhishan Li and Xu Chen and Yabiao Wang and Chengjie Wang and Lei Xie},
      year={2023},
      eprint={2312.06607},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}