Zero-shot Video Change Detection for Real-life Industrial
Applications
Mahbubul Alam, Huimin Zhuge, Teresa Gonzalez, Ahmed Farahat, Song Wang, and Chetan Gupta
Industrial AI Lab, Research & Development, Hitachi America Ltd., Santa Clara, CA, 95054, USA
mahbubul.alam@hal.hitachi.com
joy.zhuge@hal.hitachi.com
teresa.gonzalezdiaz@hal.hitachi.com
ahmed.farahat@hal.hitachi.com
song.wang@hal.hitachi.com
chetan.gupta@hal.hitachi.com
ABSTRACT
Change detection is crucial for various industrial applications.
Although image change detection datasets are abundant, the
collection of labeled video data is time-consuming, expen-
sive, and cumbersome. This scarcity of labeled data moti-
vates the development of few-shot or zero-shot video change
detection techniques which may generalize well to new situa-
tions. Existing video change detection methods require large
amounts of labeled data, are task-specific, and difficult to gen-
eralize. Therefore, in this paper, we propose a zero-shot video
change detection algorithm using pre-trained deep learning
models and conventional image processing techniques. Our
approach identifies matching frames from input videos, ad-
justs lighting conditions if necessary, and uses an existing ob-
ject detection model to identify objects in both frames. The
method is easily generalizable by making few changes. We
evaluate our proposed method on the VDAO dataset collected
in a cluttered industrial environment and demonstrate its ef-
fectiveness in detecting changes between pairs of videos con-
taining single and multiple objects.
1. INTRODUCTION
Video change detection is the process of identifying and an-
alyzing differences between two or more video frames cap-
tured at different times. The goal is to detect meaningful
changes in a scene, such as the appearance or disappearance
of objects, modifications in the environment, or movement.
This technique is crucial in various applications, including
surveillance, forensic analysis, and environmental monitor-
ing. For example, in surveillance systems, video change de-
tection can automatically flag when an object is left behind or
Mahbubul Alam et al. This is an open-access article distributed under the
terms of the Creative Commons Attribution 3.0 United States License, which
permits unrestricted use, distribution, and reproduction in any medium, pro-
vided the original author and source are credited.
removed from a scene, such as in cases of suspicious activi-
ties. The process typically involves comparing frames pixel
by pixel or analyzing patterns in object movements to detect
significant alterations. However, challenges such as lighting
variations, shadows, and background movement (e.g., trees
swaying) can complicate accurate detection. Advanced tech-
niques, like background subtraction, optical flow, and deep
learning, help improve the accuracy of detecting only mean-
ingful changes while minimizing false positives caused by
noise or minor scene variations.
Consequently, sophisticated deep learning-based techniques
are utilized to identify changes between a pair of videos. Col-
lecting sufficient labeled video data for training large deep
learning models is time-consuming, cumbersome, and expen-
sive. As such, it is imperative to develop a few-shot, ideally, a
zero-shot video change detection technique for industrial ap-
plications where labeled data are scarce. Zero-shot change
detection refers to a method that identifies changes between
two sets of data without requiring any labeled training ex-
amples of those changes. In the context of video analysis,
the model does not rely on previously labeled data indicat-
ing what types of changes to look for. Instead, it detects
differences by analyzing the features of objects in the data
and identifying new, disappeared, or altered elements directly.
This approach allows the model to generalize to unseen sce-
narios without needing specific prior training for each type of
change.
Few studies in the literature introduce deep learning video
change detection techniques using publicly available datasets.
Nevertheless, these methods require huge labeled video data
to train deep learning models from scratch. Furthermore, the
existing methods are task specific and, hence, difficult to gen-
eralize. Therefore, in this paper, we propose a zero-shot video
change detection algorithm utilizing pre-trained deep learn-
1