基于自适应支持的改进边界支持向量聚类

ID:39396

大小:3.78 MB

页数:22页

时间:2023-03-14

金币:2

上传者:战必胜

 
Citation: Li, H.; Ping, Y.; Hao, B.;
Guo, C; Liu, Y. Improved Boundary
Support Vector Clustering with
Self-Adaption Support. Electronics
2022, 11, 1854. https://doi.org/
10.3390/electronics11121854
Academic Editors: Sławomir
Nowaczyk, Rita P. Ribeiro and
Grzegorz Nalepa
Received: 13 May 2022
Accepted: 8 June 2022
Published: 11 June 2022
Publishers Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
electronics
Article
Improved Boundary Support Vector Clustering with
Self-Adaption Support
Huina Li
1
, Yuan Ping
1,
*
, Bin Hao
2,
* , Chun Guo
3
and Yujian Liu
1
1
School of Information Engineering, Xuchang University, Xuchang 461000, China; leehuina@126.com (H.L.);
batianClass@xcu.edu.cn (Y.L.)
2
Here Data Technology, Shenzhen 518000, China
3
Guizhou Provincial Key Laboratory of Public Big Data, College of Computer Science and Technology,
Guizhou University, Guiyang 550025, China; cguo@gzu.edu.cn
* Correspondence: pyuan.lhn@xcu.edu.cn (Y.P.); haobin@heredata.com.cn (B.H.)
Abstract:
Concerning the good description of arbitrarily shaped clusters, collecting accurate support
vectors (SVs) is critical yet resource-consuming for support vector clustering (SVC). Even though
SVs can be extracted from the boundaries for efficiency, boundary patterns with too much noise and
inappropriate parameter settings, such as the kernel width, also confuse the connectivity analysis.
Thus, we propose an improved boundary SVC (IBSVC) with self-adaption support for reasonable
boundaries and comfortable parameters. The first self-adaption is in the movable edge selection
(MES). By introducing a divide-and-conquer strategy with the
k
-means++ support, it collects local,
informative, and reasonable edges for the minimal hypersphere construction while rejecting pseudo-
borders and outliers. Rather than the execution of model learning with repetitive training and
evaluation, we fuse the second self-adaption with the flexible parameter selection (FPS) for direct
model construction. FPS automatically selects the kernel width to meet a conformity constraint,
which is defined by measuring the difference between the data description drawn by the model and
the actual pattern. Finally, IBSVC adopts a convex decomposition-based strategy to finish cluster
checking and labeling even though there is no prior knowledge of the cluster number. Theoretical
analysis and experimental results confirm that IBSVC can discover clusters with high computational
efficiency and applicability.
Keywords:
support vector clustering; cluster boundary; edge selection; parameter adaption;
convex decomposition
1. Introduction
Support vector clustering (SVC) has attracted much attention for handling clusters
with arbitrary shapes [
1
,
2
]. For a better description, support vectors (SVs) with their
specific coefficients should generally be collected through excellent model training, which
requires a large number of valid training samples and a complex iterative analysis under
specific metrics. Due to the increasing data size and weak representative samples, pricey
storage and computation in the training phase frequently degrade the SVC’s performance.
Meanwhile, the connectivity analysis can also be confused by inappropriate parameter
settings even with the use of correct SVs. Intuitively, we expect an efficient model to be
trained on fewer yet representative samples and comfortable parameters to be found at a
minimal cost.
Let
X
be a data set with
N
data samples
{x
1
,
x
2
,
· · ·
,
x
N
}
, where
x
i
R
d
(i [
1,
N])
in the data space. Model training is pricey because it generally has to solve a quadratic
programming problem in terms of iterative analysis on an
N × N
kernel matrix. Its runtime
usually ranges from
O(N
2
)
to
O(N
3
)
depending on the specific case [
1
,
3
,
4
]. Furthermore,
the number of iterative analyses is uncertain, although a great value for the final coefficient
vector
β
that exacerbates the practical time-cost is expected. To achieve an improvement,
Electronics 2022, 11, 1854. https://doi.org/10.3390/electronics11121854 https://www.mdpi.com/journal/electronics
资源描述:

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
关闭