Abstract:
Anomaly detection is of great significance for intelligent surveillance videos. Current works typically struggle with object detection and localization problems due to crowded scenes and lack of sufficient prior information of the objects of interest during training, resulting in false-positive detection results. Thus, in this thesis, we propose two novel frameworks for video anomaly detection and localization. We first propose a Deep Spatiotemporal Translation Network (DSTN), a novel unsupervised anomaly detection and localization method based on Generative Adversarial Network (GAN) and Edge Wrapping (EW). In this work, we introduce (i) a novel fusion of background removal and real optical flow frames with (ii) a concatenation of the original and background removal frames. We improve the performance of anomaly localization in the pixel-level evaluation by proposing (iii) the Edge Wrapping to reduce the noise and suppress non-related edges of abnormal objects. DSTN is a successful approach, providing good performance regarding anomaly detection accuracy and time complexity for surveillance videos. However, the false-positive problem has still occurred in the scene. Thus, we continue to propose Deep Residual Spatiotemporal Translation Network (DR-STN), a novel unsupervised Deep Residual conditional Generative Adversarial Network (DR-cGAN) model with an Online Hard Negative Mining (OHNM) approach to specifically remove the false-positives. The proposed DR-cGAN provides a wider network to learn a mapping from spatial to temporal representations and enhance the perceptual quality of synthesized images from a generator. Our proposed methods have been tested on publicly available anomaly datasets, including UCSD pedestrian, UMN, and CUHK Avenue, demonstrating superior results over other state-of-the-art methods both in frame-level and pixel-level evaluations.