Citation: Li, H.; Ping, Y.; Hao, B.;
Guo, C; Liu, Y. Improved Boundary
Support Vector Clustering with
Self-Adaption Support. Electronics
2022, 11, 1854. https://doi.org/
10.3390/electronics11121854
Academic Editors: Sławomir
Nowaczyk, Rita P. Ribeiro and
Grzegorz Nalepa
Received: 13 May 2022
Accepted: 8 June 2022
Published: 11 June 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Article
Improved Boundary Support Vector Clustering with
Self-Adaption Support
Huina Li
1
, Yuan Ping
1,
*
, Bin Hao
2,
* , Chun Guo
3
and Yujian Liu
1
1
School of Information Engineering, Xuchang University, Xuchang 461000, China; leehuina@126.com (H.L.);
batianClass@xcu.edu.cn (Y.L.)
2
Here Data Technology, Shenzhen 518000, China
3
Guizhou Provincial Key Laboratory of Public Big Data, College of Computer Science and Technology,
Guizhou University, Guiyang 550025, China; cguo@gzu.edu.cn
* Correspondence: pyuan.lhn@xcu.edu.cn (Y.P.); haobin@heredata.com.cn (B.H.)
Abstract:
Concerning the good description of arbitrarily shaped clusters, collecting accurate support
vectors (SVs) is critical yet resource-consuming for support vector clustering (SVC). Even though
SVs can be extracted from the boundaries for efficiency, boundary patterns with too much noise and
inappropriate parameter settings, such as the kernel width, also confuse the connectivity analysis.
Thus, we propose an improved boundary SVC (IBSVC) with self-adaption support for reasonable
boundaries and comfortable parameters. The first self-adaption is in the movable edge selection
(MES). By introducing a divide-and-conquer strategy with the
k
-means++ support, it collects local,
informative, and reasonable edges for the minimal hypersphere construction while rejecting pseudo-
borders and outliers. Rather than the execution of model learning with repetitive training and
evaluation, we fuse the second self-adaption with the flexible parameter selection (FPS) for direct
model construction. FPS automatically selects the kernel width to meet a conformity constraint,
which is defined by measuring the difference between the data description drawn by the model and
the actual pattern. Finally, IBSVC adopts a convex decomposition-based strategy to finish cluster
checking and labeling even though there is no prior knowledge of the cluster number. Theoretical
analysis and experimental results confirm that IBSVC can discover clusters with high computational
efficiency and applicability.
Keywords:
support vector clustering; cluster boundary; edge selection; parameter adaption;
convex decomposition
1. Introduction
Support vector clustering (SVC) has attracted much attention for handling clusters
with arbitrary shapes [
1
,
2
]. For a better description, support vectors (SVs) with their
specific coefficients should generally be collected through excellent model training, which
requires a large number of valid training samples and a complex iterative analysis under
specific metrics. Due to the increasing data size and weak representative samples, pricey
storage and computation in the training phase frequently degrade the SVC’s performance.
Meanwhile, the connectivity analysis can also be confused by inappropriate parameter
settings even with the use of correct SVs. Intuitively, we expect an efficient model to be
trained on fewer yet representative samples and comfortable parameters to be found at a
minimal cost.
Let
X
be a data set with
N
data samples
{x
1
,
x
2
,
· · ·
,
x
N
}
, where
x
i
∈ R
d
(i ∈ [
1,
N])
in the data space. Model training is pricey because it generally has to solve a quadratic
programming problem in terms of iterative analysis on an
N × N
kernel matrix. Its runtime
usually ranges from
O(N
2
)
to
O(N
3
)
depending on the specific case [
1
,
3
,
4
]. Furthermore,
the number of iterative analyses is uncertain, although a great value for the final coefficient
vector
β
that exacerbates the practical time-cost is expected. To achieve an improvement,
Electronics 2022, 11, 1854. https://doi.org/10.3390/electronics11121854 https://www.mdpi.com/journal/electronics