Abstract:
In the video application, slow motion is visually attractive and gets more attention in video super resolution. To generate the high-resolution (HR) slow motion video frames from the low-resolution (LR) frames, two sub-tasks are required, including video super-resolution (VSR) and video frame interpolation (VFI). However, the interpolation approach is not successful to extract low level feature attention to get the maximum advantage from the property of space-time relation. To this extent, we propose a deep consecutive attention network-based method. The multi-head attention and an attentive temporal feature module are designed to achieve better prediction of interpolation feature frame. Bi-directional deformable ConvLSTM module aggregates and aligns with the information from the multi-head attention and temporal feature block to improve the quality of video frames. This method synthesizes the HR video frames from LR video frames. The experimental results in terms of PSNR show the proposed method of deep consecutive attention outperforms 0.27 dB and 0.31 dB for Vid4 and SPMC datasets respectively, in average of PSNR compared to state-of-the-art baseline method.