在Python中的KMeans: ValueError:用序列设置数组元素。

时间:2022-03-16 18:03:48

I am trying to perform kmeans clustering in Python using numpy and sklearn. I have a txt file with 45 columns and 645 rows. The first row is Y and remaining 644 rows are X.

我正在尝试使用numpy和sklearn在Python中执行kmeans集群。我有一个txt文件,有45列和645行。第一行是Y,剩下的644行是X。

My Python code is:

我的Python代码是:

import numpy as np
import matplotlib.pyplot as plt
import csv

from sklearn.cluster import KMeans

#The following code reads the first row and terminates the loop
with open('trainDataXY.txt','r') as f:
   read = csv.reader(f)
   for first_row in read:
        y = list(first_row)
        break

#The following code skips the first row and reads rest of the rows
firstLine = True
with open('trainDataXY.txt','r') as f1:
    readY = csv.reader(f1)
    for rows in readY:
         if firstLine:
              firstLine=False
              continue
         x = list(readY)

X = np.array((x,y), dtype=object)
kmean = KMeans(n_clusters=2)
kmean.fit(X)

I get an error at this line: kmean.fit(X)

在这一行,我得到一个错误:kmean。fit(X)

The error I get is:

我得到的错误是:

Traceback (most recent call last):
File "D:\file_path\kmeans.py", line 25, in <module> kmean.fit(X)
File "C:\Anaconda2\lib\site-packages\sklearn\cluster\k_means_.py",
line 812, in fit X = self._check_fit_data(X)
File "C:\Anaconda2\lib\site-packages\sklearn\cluster\k_means_.py",
line 786, in _check_fit_data X = check_array(X, accept_sparse='csr',
dtype=np.float64)
File "C:\Anaconda2\lib\site-packages\sklearn\utils\validation.py",
line 373, in check_array array = np.array(array, dtype=dtype,
order=order, copy=copy) ValueError: setting an array element with a
sequence.`

trainDataXY.txt

trainDataXY.txt

1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5 47,64,50,39,66,51,46,37,43,37,37,35,36,34,37,38,37,39,104,102,103,103,102,108,109,107,106,115,116,116,120,122,121,121,116,116,131,131,130,132,126,127,131,128,127

1,1,1,1,1,1,1,1,1、2、2、2、2、2、2、2、2、2、3,3,3,3,3,3,3,3,3、4、4、4、4、4、4、4、4、4、5、5、5、5、5、5、5、5、5 47岁,64年,50岁,39岁,66年,51岁,46岁,37岁,43岁,37岁,37岁,35岁,36岁,34岁,37岁,38岁,37岁,39104102103103102108109107106115116116120122121121116116131131130132126127131128127年

47,65,58,30,39,48,47,35,42,37,38,37,37,36,38,38,38,40,104,103,103,103,101,108,110,108,106,116,115,116,121,121,119,121,116,116,133,131,129,132,127,128,132,126,127

39 47岁,65年,58岁的30日,48岁,47岁,35岁,42岁,37岁,38岁,37岁,37岁,36岁,38岁,38岁,38岁的40104103103103101108110108106116115116121121119121116116133131129132127128132126127人

49,69,55,28,56,64,50,30,41,37,39,37,38,36,39,39,39,40,105,103,104,104,103,110,110,108,107,116,115,117,120,120,117,121,115,116,134,131,129,134,128,125,134,126,127

49岁,69年,55岁,28岁,56岁,64年,50岁,30日,41岁,37岁,39岁,37岁,38岁,36岁,39岁,39岁,40105103104104103110110108107116115117120120117121115116134131129134128125134126127年39岁

51,78,52,46,56,74,50,28,38,38,39,38,38,37,40,39,39,41,96,101,99,104,97,101,111,101,104,115,116,116,119,110,112,119,116,116,135,130,129,135,120,108,133,120,125

51,78,52岁,46岁,56岁,74年,50岁,28岁,38岁,38岁,39岁,38岁,38岁,37岁,40岁,39岁,39岁,41岁,96101,99104,97101111101104115116116119110112119116116135130129135120108133120125

55,79,53,65,52,102,55,28,36,39,40,38,39,37,40,39,40,42,79,86,84,105,84,57,110,85,76,117,118,115,110,66,86,117,117,118,123,130,130,129,106,93,130,113,114

55岁,79年,53岁,65年,52102年,55岁,28岁,36岁,39岁,40岁,38岁,39岁,37岁,40岁,39岁,40岁,42岁,79,86,84105,84,57110,85,76117118115110,66,86117117118123130130129106,93130113114

48,80,59,81,50,120,63,26,31,39,40,39,40,38,42,37,41,42,53,73,77,90,47,34,76,52,63,106,102,97,80,33,68,105,105,113,115,130,124,111,83,91,128,105,110

48、80、59、80、81、63年,26岁,31日,39岁,40岁,39岁,40岁,38岁,42岁,37岁,41岁,42岁,53岁,73年,77年,90年,47岁,34岁,76年,52岁,63106102年,97年,80年,33岁,68105105113115130124111,83,83

45,95,56,86,38,137,60,27,27,39,40,38,40,37,41,52,38,41,24,44,44,79,40,32,48,26,28,63,52,59,42,30,62,79,67,77,116,121,122,114,96,90,126,93,103

45、95、56、95、86、60岁,27岁,27日,39岁,40岁,38岁,40岁,37岁,41岁,52岁,38岁,41岁的24日,44岁,44岁,79年,40岁,32岁,48岁,26岁,28岁,63年,52岁,59岁,42岁,30岁,62,79,67,77116121122114,96,90126,90126

45,93,47,86,35,144,60,26,27,39,40,45,39,38,43,87,46,58,33,21,26,62,42,49,49,37,24,33,41,56,29,28,68,79,58,74,115,111,115,119,117,104,132,92,97

45、93、47、86、35144、60岁,26日,27日,39岁,40岁,45岁,39岁,38岁,43岁,87年,46岁,今年58岁,33岁,21岁,26岁,62年,42岁,49岁,49岁,37岁,24岁,33岁,41岁,56岁,29岁,28岁,68年,79年,58岁的74115111115119117104132,92,97

48,85,50,83,37,142,62,25,29,57,47,77,43,64,61,115,70,101,41,28,28,48,39,46,42,38,37,47,43,74,32,28,64,86,80,81,127,113,99,130,140,112,139,92,97

48、85、50、83、37142、62年,25日,29日,57岁,47岁,77年,43岁,64年,61115年,70101年,41岁,28日,28日,48岁,39岁,46岁,42岁,38岁,37岁,47岁,43岁,74年,32岁,28岁,64,86,80,81127113,99130140112139,92,97

48,94,78,77,30,138,57,28,29,91,66,94,61,94,103,129,89,140,38,34,32,38,33,43,38,36,39,50,39,75,31,33,65,89,82,84,127,112,100,133,141,107,136,95,97

48、94、78、94、78、57岁,28日,29日,91年,66年,94年,61年,94103129年,89140年,38岁的34岁,32岁,38岁,33岁,43岁,38岁,36岁,39岁,50岁,39岁,75年,31岁,33岁,65,89,82,84127112100133141107136,95,95

45,108,158,77,30,140,67,29,26,104,97,113,92,106,141,137,116,151,33,32,32,43,44,40,37,34,37,54,86,77,55,48,77,112,83,109,120,111,105,124,133,98,129,89,99

45108158、77、45108158、77、29日,26104年,97113年,92106141137116151年,33岁,32岁,32岁,43岁,44岁,40岁,37岁,34岁,37岁,54岁,86年,77年,55岁,48岁,77112,83109120111105124133,98129,89,89

48,139,173,64,40,159,61,55,27,115,117,128,106,124,150,139,125,160,27,26,29,54,51,47,36,36,32,80,125,105,97,96,86,130,102,118,117,104,105,118,117,92,130,94,97

48139173、64、40159、61、55岁,27115117128106124150139125160年,27日,26日,29日,54岁的51岁,47岁,36岁,36岁,32,80125105,97,96,86130102118117104105118117,92130,94,97

131,157,143,66,87,130,57,118,26,124,137,129,133,138,156,133,132,173,29,25,28,81,48,38,48,32,24,134,165,144,149,142,110,145,147,161,114,112,103,118,115,94,126,87,102

131157143、66、87130、131157143、66,29日,25日,28日,81年,48岁,38岁,48岁,32,24134165144149142110145147161114112103118115,94126,94126

160,162,146,78,116,127,52,133,71,116,141,125,125,141,169,115,110,161,69,53,46,97,79,47,76,59,32,148,147,134,165,152,111,155,139,145,116,113,101,118,105,86,123,92,99

160162146、78116127、52133、160162146、78116127,53岁,46岁,97年,79年,47岁的76年,59岁,32148147134165152111155139145116113101118105,86123,92,92

1 个解决方案

#1


2  

Your data matrix should not be of type object. It should be a matrix of numbers of shape n_samples x n_features.

您的数据矩阵不应该是object类型。它应该是形状为n_samples x n_features的数字矩阵。

This error usually crops up when people try to convert a list of samples into a data matrix, and each sample is an array or a list, and at least one of the samples does not have the same length as the others. This can be figured out by evaluating np.unique(list(map(len, X))).

当人们试图将一个样本列表转换成一个数据矩阵时,这个错误通常会出现,并且每个样本都是一个数组或一个列表,并且至少其中一个样本的长度与其他的不相同。这可以通过计算np得到。独特的(列表(map(len X)))。

In your case it is different. Make sure you obtain a data matrix. The first thing to try is to replace the line X = np.array((x,y), dtype=object) with something that creates a data matrix.

你的情况就不同了。确保您获得了一个数据矩阵。首先要尝试的是用创建数据矩阵的东西替换行X = np.array(X,y), dtype=object)。

You should also opt for using numpy.recfromcsv to read your data. It will make everything easier to read.

您还应该选择使用numpy。从csv读取数据。它将使一切更容易阅读。

#1


2  

Your data matrix should not be of type object. It should be a matrix of numbers of shape n_samples x n_features.

您的数据矩阵不应该是object类型。它应该是形状为n_samples x n_features的数字矩阵。

This error usually crops up when people try to convert a list of samples into a data matrix, and each sample is an array or a list, and at least one of the samples does not have the same length as the others. This can be figured out by evaluating np.unique(list(map(len, X))).

当人们试图将一个样本列表转换成一个数据矩阵时,这个错误通常会出现,并且每个样本都是一个数组或一个列表,并且至少其中一个样本的长度与其他的不相同。这可以通过计算np得到。独特的(列表(map(len X)))。

In your case it is different. Make sure you obtain a data matrix. The first thing to try is to replace the line X = np.array((x,y), dtype=object) with something that creates a data matrix.

你的情况就不同了。确保您获得了一个数据矩阵。首先要尝试的是用创建数据矩阵的东西替换行X = np.array(X,y), dtype=object)。

You should also opt for using numpy.recfromcsv to read your data. It will make everything easier to read.

您还应该选择使用numpy。从csv读取数据。它将使一切更容易阅读。