最小分布支持向量聚类

ID：38921

阅读量：1

大小：1.90 MB

页数：20页

时间：2023-03-14

金币：2

上传者：战必胜

entropy

Article

Minimum Distribution Support Vector Clustering

Yan Wang

1,2

, Jiali Chen

, Xuping Xie

, Sen Yang

, Wei Pang

, Lan Huang

*, Shuangquan Zhang

and

Shishun Zhao



 

Citation: Wang, Y.; Chen, J.; Xie, X.;

Yang, S.; Pang, W.; Huang, L.; Zhang,

S.; Zhao, S. Minimum Distribution

Support Vector Clustering. Entropy

2021, 23, 1473. https://doi.org/

10.3390/e23111473

Academic Editors: Luis Hernández-

Callejo, Sergio Nesmachnow and Sara

Gallardo Saavedra

Received: 6 October 2021

Accepted: 4 November 2021

Published: 8 November 2021

Publisher’s Note: MDPI stays neutral

with regard to jurisdictional claims in

published maps and institutional afﬁl-

iations.

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

Key Laboratory of Symbol Computation and Knowledge Engineering, Ministry of Education, Colleague of

Computer Science and Technology, Jilin University, Changchun 130012, China; wy6868@jlu.edu.cn (Y.W.);

jiali19@mails.jlu.edu.cn (J.C.); xiexp21@mails.jlu.edu.cn (X.X.); ystop2020@gmail.com (S.Y.);

shuangquan18@mails.jlu.edu.cn (S.Z.)

School of Artiﬁcial Intelligence, Jilin University, Changchun 130012, China

School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh EH14 4AS, UK;

w.pang@hw.ac.uk

College of Mathematics, Jilin University, Changchun 130012, China; zhaoss@jlu.edu.cn

* Correspondence: Huanglan@jlu.edu.cn

Abstract:

Support vector clustering (SVC) is a boundary-based algorithm, which has several advan-

tages over other clustering methods, including identifying clusters of arbitrary shapes and numbers.

Leveraged by the high generalization ability of the large margin distribution machine (LDM) and the

optimal margin distribution clustering (ODMC), we propose a new clustering method: minimum

distribution for support vector clustering (MDSVC), for improving the robustness of boundary

point recognition, which characterizes the optimal hypersphere by the ﬁrst-order and second-order

statistics and tries to minimize the mean and variance simultaneously. In addition, we further prove,

theoretically, that our algorithm can obtain better generalization performance. Some instructive

insights for adjusting the number of support vector points are gained. For the optimization problem

of MDSVC, we propose a double coordinate descent algorithm for small and medium samples.

The experimental results on both artiﬁcial and real datasets indicate that our MDSVC has a signiﬁcant

improvement in generalization performance compared to SVC.

Keywords: support vector clustering; margin theory; mean; variance; dual coordinate descent

1. Introduction

Cluster analysis groups a dataset into clusters according to the correlations of data.

To date, many clustering algorithms have emerged, such as plane-based clustering algo-

rithm, spectral clustering, density-based DBSCAN [

], OPTICS [

], Density Peak algorithm

(DP) characterizing the center of clusters [

], and partition-based k-means algorithm [

In particular, the support vector machine (SVM) has become an important tool for data

mining. As a classical machine learning algorithm, SVM can well address the issue of local

extremum and high dimensionality of data in the process of model optimization, and it

makes data separable in feature space through nonlinear transformation [5].

In particular, Tax and Duin proposed a novel method in which the decision boundaries

are constructed by a set of support vectors, the so-called support vector domain description

(SVDD) [

]. Leveraged by the kernel theory and SVDD, support vector clustering (SVC) was

proposed based on contour clustering, which has many advantages over other clustering

algorithms [

]. SVC is robust to noise and does not need to pre-specify the number of

clusters in advance. For SVC, it is feasible to adjust its parameter C to obtain better

performance, but this comes at the cost of increasing outliers, and it only introduces a soft

boundary for optimization. Several insights into understanding the features of SVC have

been offered in [

]. After studying the relevant literature, we found that these insights

mainly cover two aspects: the ﬁrst aspect is the selection of parameters q and C. Lee and

Daniels chose a method similar to a secant to generate monotone increasing sequences of

Entropy 2021, 23, 1473. https://doi.org/10.3390/e23111473 https://www.mdpi.com/journal/entropy

资源描述：

当前文档最多预览五页，下载文档查看全文

侵权申诉



1 1 2 3 4 5 / 20



此文档下载收益归作者所有

当前文档最多预览五页，下载文档查看全文

版权提示

温馨提示：
1. 部分包含数学公式或PPT动画的文件，查看预览时可能会显示错乱或异常，文件下载后无此问题，请放心下载。
2. 本文档由用户上传，版权归属用户，天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容，确认文档内容符合您的需求后进行下载，若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误，付费完成后未能成功下载的用户请联系客服处理。

大家都在看

近期热门

最小分布支持向量聚类

最近更新

大家都在看

相关文章

相关标签