最小分布支持向量聚类

ID:38921

大小:1.90 MB

页数:20页

时间:2023-03-14

金币:2

上传者:战必胜
entropy
Article
Minimum Distribution Support Vector Clustering
Yan Wang
1,2
, Jiali Chen
1
, Xuping Xie
1
, Sen Yang
1
, Wei Pang
3
, Lan Huang
1,
*, Shuangquan Zhang
1
and
Shishun Zhao
4

 
Citation: Wang, Y.; Chen, J.; Xie, X.;
Yang, S.; Pang, W.; Huang, L.; Zhang,
S.; Zhao, S. Minimum Distribution
Support Vector Clustering. Entropy
2021, 23, 1473. https://doi.org/
10.3390/e23111473
Academic Editors: Luis Hernández-
Callejo, Sergio Nesmachnow and Sara
Gallardo Saavedra
Received: 6 October 2021
Accepted: 4 November 2021
Published: 8 November 2021
Publishers Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
1
Key Laboratory of Symbol Computation and Knowledge Engineering, Ministry of Education, Colleague of
Computer Science and Technology, Jilin University, Changchun 130012, China; wy6868@jlu.edu.cn (Y.W.);
jiali19@mails.jlu.edu.cn (J.C.); xiexp21@mails.jlu.edu.cn (X.X.); ystop2020@gmail.com (S.Y.);
shuangquan18@mails.jlu.edu.cn (S.Z.)
2
School of Artificial Intelligence, Jilin University, Changchun 130012, China
3
School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh EH14 4AS, UK;
w.pang@hw.ac.uk
4
College of Mathematics, Jilin University, Changchun 130012, China; zhaoss@jlu.edu.cn
* Correspondence: Huanglan@jlu.edu.cn
Abstract:
Support vector clustering (SVC) is a boundary-based algorithm, which has several advan-
tages over other clustering methods, including identifying clusters of arbitrary shapes and numbers.
Leveraged by the high generalization ability of the large margin distribution machine (LDM) and the
optimal margin distribution clustering (ODMC), we propose a new clustering method: minimum
distribution for support vector clustering (MDSVC), for improving the robustness of boundary
point recognition, which characterizes the optimal hypersphere by the first-order and second-order
statistics and tries to minimize the mean and variance simultaneously. In addition, we further prove,
theoretically, that our algorithm can obtain better generalization performance. Some instructive
insights for adjusting the number of support vector points are gained. For the optimization problem
of MDSVC, we propose a double coordinate descent algorithm for small and medium samples.
The experimental results on both artificial and real datasets indicate that our MDSVC has a significant
improvement in generalization performance compared to SVC.
Keywords: support vector clustering; margin theory; mean; variance; dual coordinate descent
1. Introduction
Cluster analysis groups a dataset into clusters according to the correlations of data.
To date, many clustering algorithms have emerged, such as plane-based clustering algo-
rithm, spectral clustering, density-based DBSCAN [
1
], OPTICS [
2
], Density Peak algorithm
(DP) characterizing the center of clusters [
3
], and partition-based k-means algorithm [
4
].
In particular, the support vector machine (SVM) has become an important tool for data
mining. As a classical machine learning algorithm, SVM can well address the issue of local
extremum and high dimensionality of data in the process of model optimization, and it
makes data separable in feature space through nonlinear transformation [5].
In particular, Tax and Duin proposed a novel method in which the decision boundaries
are constructed by a set of support vectors, the so-called support vector domain description
(SVDD) [
6
]. Leveraged by the kernel theory and SVDD, support vector clustering (SVC) was
proposed based on contour clustering, which has many advantages over other clustering
algorithms [
7
]. SVC is robust to noise and does not need to pre-specify the number of
clusters in advance. For SVC, it is feasible to adjust its parameter C to obtain better
performance, but this comes at the cost of increasing outliers, and it only introduces a soft
boundary for optimization. Several insights into understanding the features of SVC have
been offered in [
8
,
9
]. After studying the relevant literature, we found that these insights
mainly cover two aspects: the first aspect is the selection of parameters q and C. Lee and
Daniels chose a method similar to a secant to generate monotone increasing sequences of
Entropy 2021, 23, 1473. https://doi.org/10.3390/e23111473 https://www.mdpi.com/journal/entropy
资源描述:

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
关闭