Article
Automatic Inside Point Localization with Deep Reinforcement
Learning for Interactive Object Segmentation
Guoqing Li
1,2
, Guoping Zhang
1,2
and Chanchan Qin
3,4,
*
Citation: Li, G.; Zhang, G.; Qin, C.
Automatic Inside Point Localization
with Deep Reinforcement Learning
for Interactive Object Segmentation.
Sensors 2021, 21, 6100. https://
doi.org/10.3390/s21186100
Academic Editor: Nunzio Cennamo
Received: 1 August 2021
Accepted: 9 September 2021
Published: 11 September 2021
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
1
College of Physical Science and Technology, Central China Normal University, NO. 152 Luoyu Road,
Wuhan 430079, China; liguoqing@mails.ccnu.edu.cn (G.L.); gpzhang@mail.ccnu.edu.cn (G.Z.)
2
Key Laboratory of Quark and Lepton Physics (MOE) and College of Physics Science and Technology,
Central China Normal University, NO. 152 Luoyu Road, Wuhan 430079, China
3
School of Big Data and Computer Science, Guizhou Normal University, The University Town,
Guian New Area, Guiyang 550025, China
4
Center for RFID and WSN Engineering, Department of Education, Guizhou Normal University,
The University Town, Guian New Area, Guiyang 550025, China
* Correspondence: 201407141@gznu.edu.cn
Abstract:
In the task of interactive image segmentation, the Inside-Outside Guidance (IOG) algo-
rithm has demonstrated superior segmentation performance leveraging Inside-Outside Guidance
information. Nevertheless, we observe that the inconsistent input between training and testing
when selecting the inside point will result in significant performance degradation. In this paper, a
deep reinforcement learning framework, named Inside Point Localization Network (IPL-Net), is
proposed to infer the suitable position for the inside point to help the IOG algorithm. Concretely,
when a user first clicks two outside points at the symmetrical corner locations of the target object, our
proposed system automatically generates the sequence of movement to localize the inside point. We
then perform the IOG interactive segmentation method for precisely segmenting the target object
of interest. The inside point localization problem is difficult to define as a supervised learning
framework because it is expensive to collect image and their corresponding inside points. Therefore,
we formulate this problem as Markov Decision Process (MDP) and then optimize it with Dueling
Double Deep Q-Network (D3QN). We train our network on the PASCAL dataset and demonstrate
that the network achieves excellent performance.
Keywords:
interactive image segmentation; Markov Decision Process (MDP); Deep Reinforcement
Learning (DRL); inside point localization; Deep Q-Network (DQN)
1. Introduction
Interactive image segmentation allows users to explicitly control the segmentation
mask using human-friendly annotators, which can be formalized via various represen-
tations: bounding boxes, scribbles, clicks, or extreme points. As one of the fundamental
problems in computer vision, it has obtained remarkable results in broad applications, such
as medical image analysis [
1
], image editing [
2
], and especially pixel-level annotation [
3
].
In the early days, a large number of traditional approaches [
4
–
8
] have been developed in
this direction. Boykov et al. [
4
] considered interactive segmentation problem as an opti-
mization problem and utilized a graph cut-based method to extract the object automatically.
Following, Price et al. [
6
] improve the graph cut method by applying geodesic distances
for energy minimization. Grady introduces an interactive segmentation algorithm called
random walks [
7
]. Here, the pixel labels are assigned as the label of the first seed that the
walker reaches. All these methods based on low level-features cannot distinguish between
the target object and background in the case of complex and variable scenes.
Over the past few years, deep learning-based algorithms have become popular in
computer vision and have also showed astonishing performances in interactive segmen-
tation problems. Xu et al. [
9
] put forward a CNN-based model to solve the interactive
Sensors 2021, 21, 6100. https://doi.org/10.3390/s21186100 https://www.mdpi.com/journal/sensors