㷨ĽеӦо
С
Ϻ̼ѧ йͨѧԺ 201620
ѧƷ룺620. 2030 ͼţX92,U4 ױʶ룺A
ժҪѧϰо㷨Ļk-means㷨иĽõMK-meansĽ㷨ѡȡʼ㣬ͬľѻͼС֧˼Ӧõ㷨ĽУͨС֧СȦkʼģɴﵽһʼֵĵ㿪ʼ࣬ЧţЧ˷ѡȡʼȱ㡣ѸĽ㷨ӦõУȷȼԵδʩһʵӦüֵ
ؼʣģࣻھ㷨Ľ
Improved Clustering analysis Algorithm and
its Application in the tunnel disease
(Shanghai University of Engineering Science College of
Urban Rail TransportationShanghai 201620 China)
Abstract: Study in the classical algorithm based on cluster analysis, try to
improve k-means algorithm,and get the MK-means Algorithm.If we randomly select
initial point iteration,that each operation will produce different clustering
results, based on the minimum spanning tree of graph theory, by calculating the
minimum spanning tree to get the minimum cycle, resulting in the beginning of
k- start cluster center, up to a certain point of the initial clustering
threshold start clustering, the result is better, and of some practical value.
Key
words: cluster analysis; fuzzy clustering; data
mining; Algorithm improvement
1
ϵ··Уͳ50ϰ֮࣬еķȡʩԤιϵаȫȶĿǰֶŲ飬˷ʱ䣬Դȡо㷨һָĽ㷨ͨС֧ľ㷨ĸĽøĽģ㷨Լȼ۷IJݽоóͽ飬ΪԤṩʵӦIJο
2k-ֵ㷨
k-ֵ㷨ƽͺΪĿ꺯ûָk֣ͨݶ½ŻĿ꺯ʹĿ꺯ֵС[1]㷨ھkԣһöٷk-ֵ㷨Ӳֵһ֣ÿٰһÿֻһ[2]
轫nX={x1,x2, ...,xn}Ϊk࣬niʾi
Xʾiģuij¶[3]
1jڵi
0
j=1,2,...,n
i=1,2, ...,k
i=1,2, ...,k
iƽΪ
Ŀ꺯ʾΪ
ͨʹJ(u)ȡСֵJ(u*)=min{J(u)}ӶҵŻuij*
k-ֵ㷨ڲܼҸ֮ʱ㷨ЧϺ[4]
3Ľľ㷨MK-means㷨
о㷨Ļk-means㷨иĽMK-means (Mended K-Means)㷨ڱk-ֵ㷨Уʼѡȡġѡȡijʼе㣬ÿk-ֵ㷨ͬľMK-means㷨ͼС֧˼룬ͨС֧õСȦkʼģõõľøĽ㷨õԤУкܺõȷԺͲԣMK-means㷨ܹŻk-ֵ㷨
3.1 MK-means㷨
ͨͼG=(V,E)ÿһe=[vi,vj]һǸȨw(e)=wijwij0
1T=(V,E)Gһ֧EбߵȨֵ֮Ϊ֧TȨΪw(T)
[5]
2֧T*Ȩw(T*)G֧СģT*GС֧ҲС
[6]
ҵСԺһҪҵСȦMC
ʼѡȡ£
ȣͨPrim㷨ҵС
ڶͨ㷨ҵСȦ
СȦнڵͱߴСгȥʣĽڵ¼߳µС
ģҳn+1-n/kСȦ
壬n+1-n/kСȦģΪʼġ
ͨһ˵СȦḶ́ͼʾִСʾ߳


һ6ڵ10ߵͼΪҳ53ڵ2ߵСȦҪ6ֳ2ࡣ
ͼ1 ԭͼ ͼ2 С ͼ3 СȦ
ͼ1 ʾԭͼ6ڵ10ߣͼ2 ͼ1еС6ڵ5ߣͼ3ҵ5Ȧ(1,2,3) (2,3,6) (2,3,4) (3,4,5) (3,4,6)߱־ȦΪСȦ
ͼ1˹ԭͼҪҳСȦҳСͼ2ʾСУڵΧɵȦijȣͼУL(1,2,3)=16+5+19=40L(2,3,6)=26L(2,3,4)=17L(3,4,5)=19L(3,4,6)=30ɴ˿ɼ234ڵɵȦССȦСȦĵľk-ֵеһʼġȻڵ234ͼɾĽڵΪڶ࣬ɽڵ156ɵȦΪk-ֵеڶʼġ
MK-means㷨裺ʼ;㡣ʼ̰ССȦͼȦ[7]ݼԿһn㣬n(n-1)/2ɵͨͼÿݱΪͼеһڵ㣬ͨͼص㣬ÿڵ֮䶼һߣߵȨؾDZߵijȣÿԼ[n/k]ݵ㡣ڳʼΣҪҳkСȦkʼġ
3.2 MK-means㷨
Step1ʼ
X={x1,x2,,xn}
k
X=[ x1,x2,,xn]//С
for j=1
to k
һd=[]
d1=x1
for i=1
to n-1
ci=D(xi, xk)=min{D(xi,xk)}xi
dxk
X-d
di+1= xk //СȦ
for i=1
to (k-1)n/k+1
fi=c0+i+ c1+i++ cn/k-2+i+D(di,di+n/k)
fi=min{ fi} //СȦCj
Cj=means(di,,di+n/k)
ݼɾСȦеݵ
j=j+1X=X-d 㷨ֱֹj=k
Step2k-ֵ㷨
Step3
4Ľ㷨еӦ
Ľ㷨ӦڲݷУݼΪͳݣʵݵķ֪ģԱIJԱȷо㷨ָ1ʾ
1 㷨ָ
ָ
|
K-means
|
MK-means
|
1
|
2
|
3
|
1
|
ڵ
|
38,62,50
|
50,42,58
|
47,34,69
|
50,44,56
|
|
0.08
|
0.12
|
0.287
|
0.12
|
|
0,12, 0
|
0,5,13
|
8,4,31
|
0,8,10
|
ȷ
|
38,50,50
|
50,37,45
|
39,30,38
|
30,20,27
|
ȷ
|
0.92
|
0.88
|
0.713
|
0.88
|
ָ
|
0.003
|
0.3541
|
0.6429
|
1
|
ͨVC++Ƴʵּ¼ʶʶ£
һ¼ʶ¼iġб=¼jġб𡱣Ҽ¼iġ=¼jġҼ¼iġš=¼jġšҼ¼iġ=¼jġ¼i=¼j¼i
¼j
¼ʶÿ¼һʶԣ¼ĸԶͬͬġʶ϶ÿһ¼ȽһΣÿһαȽ϶ǵļ¼ֱеļ¼ǡ2Ϊ¼ʶʾ
2 ¼ʶʾ
б
|
|
|
|
ӻĿ
|
ӻȼ
|
|
λ
|
ID
|
|
|
094
|
|
ѻ
|
2/3
|
0.090234
|
m
|
286
|
|
|
094
|
|
©ˮ
|
1/3
|
0.053079
|
m
|
286
|
|
|
094
|
|
ˮʩ
|
2/3
|
0.140127
|
m
|
286
|
|
|
094
|
|
̵
|
2/3
|
0.13482
|
m
|
286
|
|
|
094
|
|
粻
|
1/3
|
1
|
|
286
|
ڶ¼顣ʶļ¼Ȼֱھ࣬ҪԡӻĿӻȼ͡Բͬͬļ¼ʶԣ¼Ϊṩʶݡ
3ΪͨVC++ԱвϢϸֳͣΪĽľ㷨MK-means㷨
3Ϣ
Ŀ
|
|
|
S0601
|
Ȳ
|
2
|
S0602
|
ǿȲ
|
6
|
S0101
|
λƶ
|
164
|
S0501
|
|
278
|
S0502
|
|
321
|
S0103
|
ѹ
|
437
|
S0301
|
|
460
|
S0402
|
Ӫͨ粻
|
690
|
S0303
|
̮ʯ
|
961
|
S0503
|
̵
|
198
|
S0302
|
ˮʩ
|
2113
|
S0201
|
ʴ
|
2629
|
S0403
|
|
3716
|
S0102
|
ѻ
|
5533
|
S0202
|
©ˮ
|
10562
|
S0401
|
粻
|
13595
|
ͨС֧ĸĽ㷨㣬óҪ;ͼ4ʾ
ͼ4. С֧ͼ
ɻС֧ͼó۽4ʾ
4 ۽
ID
|
ھ
|
|
1
|
2
|
3
|
4
|
1
|
0.1283
|
0
|
0.6991
|
0.7680
|
0.8628
|
2
|
0.1466
|
0.6991
|
0
|
0.8317
|
0.9017
|
3
|
0.0959
|
0.7680
|
0.8317
|
0
|
0.5287
|
4
|
0.2011
|
0.8628
|
0.9017
|
0.5287
|
0
|
|
|
|
|
|
|
ϱԷָĽ㷨MK-meansŵ㣬 MK-means㷨ķȷʿɴﵽ0.88ҶһݼÿвĽdzȶʼ֮MK-meansľٶȸ졣Ľ㷨ڽȷԡȶԡٶȵȼk-ֵ㷨ĽȶҽӽŽͨԵ֪ѻ©ˮ粻㡢ΪҪӪӦô⼸ȡκԤʩ
1©ˮضΣáΪΪŽϡĴʩȶԳȫѹעˮ࣬Ȼ沢ͿˢֹˮͿʽˮ[8]ڼгˮˮɣͨˮܽˮˮˮܣܽˮ˲
2ʵǿƹյǿȣﵽЧ
3ϺķΣԲۺˮ·ṹǿֹ̮ȣɸʵۺá
5ο
[1] Jܵ. 㷨о[J]. . 2008. 19(1). 50-52
[2] Ruspini E H.
Numerical methods for fuzzy clustering[J]. Information Science. 1970. 15(2).
319-350
[3]
Tamra S, et.al. Pattern classification based on fuzzy relations[J]. IEEE Trans.
SMC. 1971. 1(1). 217-242
[4]
Backer E, Jain A K. A clustering performance measure based on fuzzy set
decomposition[J]. IEEE Trans. PAMI. 1981. 3(1). 66-74
[5] ². ģӦ[M]. . ӿƼѧ. 2004. 49-54
[6] . Ȼģ㷨о[D]. . ӿƼѧ. 2004. 5-6
[7]
Krishnapuram R, Killer J M. A possibilistic approach to clustering[J]. IEEE
Transactions on Fuzzy System. 1993. 1(2). 98-110
[8]
Selim S Z, Ismail M A. Soft clustering of multidimensional data: a semi-fuzzy
approach[]. Pattern Recognition. 1984. 17(5). 559-568