I'm a newbie in python and trying to normalize each index in list using preprocessing.normalize
. However, it gives me an error with ValueError: setting an array element with a sequence.
我是python中的新手,并尝试使用preprocessing.normalize规范化列表中的每个索引。但是,它给我一个ValueError错误:设置一个带序列的数组元素。
And then, I found what the problem was. It was because the length(size)
of each index in np.array
was different.
然后,我发现了问题所在。这是因为np.array中每个索引的长度(大小)不同。
Here is my code,
这是我的代码,
result = []
for url in target_url :
sensor = pd.read_csv(url, header=None, delimiter=r"\s+")
result.append(sensor[2])
result = np.array(result)
# I want to resample here before it goes to normalize.
result = preprocessing.normalize(result, norm='l1')
I have target_url
to get sensor data from webserver, and each appends to the result
list. Then, it converts to array by using np.array
我有target_url从webserver获取传感器数据,每个都附加到结果列表。然后,它使用np.array转换为数组
For example,
I have len(result[0])
has 121598
and len(result[1])
has 1215601
. I want to make result[0]
to be same length of result[1]
using resample to fill NaN.
我有len(result [0])有121598和len(result [1])有1215601.我想使用resample填充NaN使result [0]与result [1]的长度相同。
How can I do that?
我怎样才能做到这一点?
Please help me out here.
请帮帮我。
Thanks in advance.
提前致谢。
EDIT
After normalizing, I'm trying to do correlation using corr()
正常化后,我正在尝试使用corr()进行相关
Here is the code,
这是代码,
result = preprocessing.normalize(result, norm='l1')
ret = pd.DataFrame(result)
corMat = DataFrame(ret.T.corr())
1 个解决方案
#1
1
Since you are using pandas
to read csv, you are off to a good start. One way to do it is simply use pd.concat
, to join the Series (I assume sensor[2]
is a Series) in the result
list into one DataFrame
. This is an example:
既然您正在使用pandas来阅读csv,那么您将有一个良好的开端。一种方法是使用pd.concat将结果列表中的Series(我假设sensor [2]是一个Series)加入到一个DataFrame中。这是一个例子:
a = [pd.Series([1, 2, 3]), pd.Series([1, 2]), pd.Series([1, 2, 3, 4])]
pd.concat(a, axis=1)
Which gives:
0 1 2
0 1.0 1.0 1
1 2.0 2.0 2
2 3.0 NaN 3
3 NaN NaN 4
In the example provided by OP, this should suffice:
在OP提供的示例中,这应该足够了:
result = []
for url in target_url :
sensor = pd.read_csv(url, header=None, delimiter=r"\s+")
result.append(sensor[2])
# concatenate Series, and do both forward and backward fill for NaNs
result = pd.concat(result, axis=1).fillna(method='bfill').fillna(method='ffill')
result = preprocessing.normalize(result, norm='l1')
# correlation
pd.DataFrame(result).T.corr()
Depending on what the Series indices look like, and your application, you can do different types of concatenations. Here's the docs.
根据Series索引的外观和应用程序,您可以执行不同类型的连接。这是文档。
#1
1
Since you are using pandas
to read csv, you are off to a good start. One way to do it is simply use pd.concat
, to join the Series (I assume sensor[2]
is a Series) in the result
list into one DataFrame
. This is an example:
既然您正在使用pandas来阅读csv,那么您将有一个良好的开端。一种方法是使用pd.concat将结果列表中的Series(我假设sensor [2]是一个Series)加入到一个DataFrame中。这是一个例子:
a = [pd.Series([1, 2, 3]), pd.Series([1, 2]), pd.Series([1, 2, 3, 4])]
pd.concat(a, axis=1)
Which gives:
0 1 2
0 1.0 1.0 1
1 2.0 2.0 2
2 3.0 NaN 3
3 NaN NaN 4
In the example provided by OP, this should suffice:
在OP提供的示例中,这应该足够了:
result = []
for url in target_url :
sensor = pd.read_csv(url, header=None, delimiter=r"\s+")
result.append(sensor[2])
# concatenate Series, and do both forward and backward fill for NaNs
result = pd.concat(result, axis=1).fillna(method='bfill').fillna(method='ffill')
result = preprocessing.normalize(result, norm='l1')
# correlation
pd.DataFrame(result).T.corr()
Depending on what the Series indices look like, and your application, you can do different types of concatenations. Here's the docs.
根据Series索引的外观和应用程序,您可以执行不同类型的连接。这是文档。