ӭ
ղرվ
 
 
ǰλ<<վҳ<<ϸϢ

㷨ĽеӦо

ʱ䣺2016-12-21 1762


ĽеӦо

С

Ϻ̼ѧ  йͨѧԺ  201620

ѧƷ룺620. 2030    ͼţX92,U4    ױʶ룺A

ժҪѧϰо㷨Ļk-means㷨иĽõMK-meansĽ㷨ѡȡʼ㣬ͬľѻͼС֧˼Ӧõ㷨ĽУͨС֧СȦkʼģɴﵽһʼֵĵ㿪ʼ࣬ЧţЧ˷ѡȡʼȱ㡣ѸĽ㷨ӦõУ׼ȷȼԵδʩһʵӦüֵ

ؼʣģࣻھ㷨Ľ

 

Improved Clustering analysis Algorithm and its Application in the tunnel disease

(Shanghai University of Engineering Science College of Urban Rail TransportationShanghai 201620 China)

Abstract: Study in the classical algorithm based on cluster analysis, try to improve k-means algorithm,and get the MK-means Algorithm.If we randomly select initial point iteration,that each operation will produce different clustering results, based on the minimum spanning tree of graph theory, by calculating the minimum spanning tree to get the minimum cycle, resulting in the beginning of k- start cluster center, up to a certain point of the initial clustering threshold start clustering, the result is better, and of some practical value.

Key words: cluster analysis; fuzzy clustering; data mining; Algorithm improvement

1

ϵ··Уͳ50ϰ֮࣬еķȡʩԤιϵаȫȶĿǰֶ󶼲Ų飬˷ʱ䣬Դȡо㷨һָĽ㷨ͨС֧ľ㷨ĸĽøĽģ㷨Լȼ۷IJݽоóͽ飬ΪԤṩʵӦIJο

2k-ֵ㷨

k-ֵ㷨ƽͺΪĿ꺯ûָk֣ͨݶ½ŻĿ꺯ʹĿ꺯ֵС[1]㷨ھkԣһöٷk-ֵ㷨Ӳֵһ֣ÿٰһÿֻһ[2]

nX={x1,x2, ...,xn}ΪkniʾiXʾiģuij[3]

uij=

 
      1jڵi


      0

j=1,2,...,ni=1,2, ...,ki=1,2, ...,k

iƽΪ

Ŀ꺯ʾΪͨʹJ(u)ȡСֵJ(u*)=min{J(u)}ӶҵŻuij*

k-ֵ㷨ڲܼҸ֮ʱ㷨ЧϺ[4]

3Ľľ㷨MK-means

о㷨Ļk-means㷨иĽMK-means (Mended K-Means)ڱ׼k-ֵ㷨Уʼѡȡġѡȡijʼе㣬ÿk-ֵ㷨ͬľMK-means㷨ͼС֧˼룬ͨС֧õСȦkʼģõõľøĽ㷨õԤУкܺõ׼ȷԺͲԣMK-means㷨ܹŻk-ֵ㷨

3.1 MK-means

ͨͼG=(V,E)ÿһe=[vi,vj]һǸȨw(e)=wijwij0

1T=(V,E)Gһ֧EбߵȨֵ֮Ϊ֧TȨΪw(T)[5]

2֧T*Ȩw(T*)G֧СģT*GС֧ҲС[6]

ҵСԺһҪҵСȦMC

ʼѡȡ£

ȣͨPrim㷨ҵС

ڶͨ㷨ҵСȦ

СȦнڵͱߴСгȥʣĽڵ¼߳µС

ģҳn+1-n/kСȦ

n+1-n/kСȦģΪʼġ

ͨһ˵СȦḶ́ͼʾִСʾ߳

һ6ڵ10ߵͼΪҳ53ڵ2ߵСȦҪ6ֳ2

 

 

 

 

                                 

ͼ1 ԭͼ                   ͼ2 С                 ͼ3 СȦ

ͼ1 ʾԭͼ6ڵ10ߣͼ2 ͼ1еС6ڵ5ߣͼ3ҵ5Ȧ(1,2,3) (2,3,6) (2,3,4) (3,4,5) (3,4,6)߱־ȦΪСȦ

ͼ1˹ԭͼҪҳСȦҳСͼ2ʾСУڵΧɵȦijȣͼУL(1,2,3)=16+5+19=40L(2,3,6)=26L(2,3,4)=17L(3,4,5)=19L(3,4,6)=30ɴ˿ɼ234ڵɵȦССȦСȦĵľk-ֵеһʼġȻ󣬽ڵ234ͼɾĽڵΪڶ࣬ɽڵ156ɵȦΪk-ֵеڶʼġ

MK-means㷨裺ʼ;㡣ʼ̰ССȦͼȦ[7]ݼԿһnn(n-1)/2ɵͨͼÿݱΪͼеһڵ㣬ͨͼص㣬ÿڵ֮䶼һߣߵȨؾDZߵijȣÿԼ[n/k]ݵ㡣ڳʼ׶ΣҪҳkСȦkʼġ

3.2 MK-means

Step1ʼ

X={x1,x2,,xn}

k

X=[ x1,x2,,xn]//С

for j=1 to k

һd=[]

d1=x1

for i=1 to n-1

     ci=D(xi, xk)=min{D(xi,xk)}xidxkX-d

     di+1= xk //СȦ

for i=1 to (k-1)n/k+1

 fi=c0+i+ c1+i++ cn/k-2+i+D(di,di+n/k)

fi=min{ fi} //СȦCj

Cj=means(di,,di+n/k)

ݼɾСȦеݵ

j=j+1X=X-d 㷨ֱֹj=k

Step2k-ֵ㷨

Step3

4Ľ㷨еӦ

Ľ㷨ӦڲݷУݼΪͳݣʵݵķ֪ģԱIJԱ׼׼ȷо㷨ָ1ʾ

1 㷨ָ

ָ

K-means

MK-means

1

2

3

1

ڵ

38,62,50

50,42,58

47,34,69

50,44,56

0.08

0.12

0.287

0.12

0,12, 0

0,5,13

8,4,31

0,8,10

ȷ

38,50,50

50,37,45

39,30,38

30,20,27

׼ȷ

0.92

0.88

0.713

0.88

ָ

0.003

0.3541

0.6429

1

ͨVC++Ƴʵּ¼ʶʶ£

һ¼ʶ¼iġб=¼jġб𡱣Ҽ¼iġ=¼jġҼ¼iġš=¼jġšҼ¼iġ=¼jġ¼i=¼j򣬼¼i¼j

¼ʶÿ¼һʶԣ¼ĸԶͬͬġʶ϶ÿһ¼ȽһΣÿһαȽ϶ǵļ¼ֱеļ¼ǡ2Ϊ¼ʶʾ

2 ¼ʶʾ

б

ӻĿ

ӻȼ

λ

ID

094

ѻ

2/3

0.090234

m

286

094

©ˮ

1/3

0.053079

m

286

094

ˮʩ

2/3

0.140127

m

286

094

̵

2/3

0.13482

m

286

094

޽粻

1/3

1

286

ڶ¼顣ʶļ¼Ȼֱھ࣬ҪԡӻĿӻȼ͡Բͬͬļ¼ʶԣ¼Ϊṩʶݡ

3ΪͨVC++ԱвϢϸֳͣΪĽľ㷨MK-means㷨׼

3Ϣ

Ŀ

S0601

Ȳ

2

S0602

ǿȲ

6

S0101

λƶ

164

S0501

278

S0502

321

S0103

ѹ

437

S0301

460

S0402

Ӫͨ粻

690

S0303

̮ʯ

961

S0503

̵

198

S0302

ˮʩ

2113

S0201

ʴ

2629

S0403

3716

S0102

ѻ

5533

S0202

©ˮ

10562

S0401

޽粻

13595

 

ͨС֧ĸĽ㷨㣬óҪ;ͼ4ʾ

ͼ4. С֧ͼ

ɻС֧ͼó۽4ʾ

4 ۽

ID

ھ

1

2

3

4

1

0.1283

0

0.6991

0.7680

0.8628

2

0.1466

0.6991

0

0.8317

0.9017

3

0.0959

0.7680

0.8317

0

0.5287

4

0.2011

0.8628

0.9017

0.5287

0

ϱԷָĽ㷨MK-meansŵ㣬 MK-means㷨ķ׼ȷʿɴﵽ0.88ҶһݼÿвĽdzȶʼ֮MK-meansľٶȸ졣Ľ㷨ڽ׼ȷԡȶԡٶȵȼk-ֵ㷨ĽȶҽӽŽͨԵ֪ѻ©ˮ޽粻㡢ΪҪӪӦô⼸ȡκԤʩ

1©ˮضΣáΪΪŽϡĴʩȶԳȫѹעˮ࣬Ȼ沢ͿˢֹˮͿʽˮ[8]ڼгˮˮɣͨˮܽˮ๵ˮˮܣܽˮ˲๵

2ʵǿƹյǿȣﵽЧ

3ϺķΣԲۺˮ·ṹǿֹ̮ȣɸʵۺá


 

5ο

[1] Jܵ. 㷨о[J]. . 2008. 19(1). 50-52

[2] Ruspini E H. Numerical methods for fuzzy clustering[J]. Information Science. 1970. 15(2). 319-350

[3] Tamra S, et.al. Pattern classification based on fuzzy relations[J]. IEEE Trans. SMC. 1971. 1(1). 217-242

[4] Backer E, Jain A K. A clustering performance measure based on fuzzy set decomposition[J]. IEEE Trans. PAMI. 1981. 3(1). 66-74

[5] ². ģӦ[M]. . ӿƼѧ. 2004. 49-54

[6] . Ȼģ㷨о[D]. . ӿƼѧ. 2004. 5-6

[7] Krishnapuram R, Killer J M. A possibilistic approach to clustering[J]. IEEE Transactions on Fuzzy System. 1993. 1(2). 98-110

[8] Selim S Z, Ismail M A. Soft clustering of multidimensional data: a semi-fuzzy approach[]. Pattern Recognition. 1984. 17(5). 559-568

 

ͳ