I have a txt file, which can be shown as:
我有一个txt文件,可以显示为:
10 1:0.870137474304 2:0.722354071782 3:0.671913562758
11 1:0.764133072717 2:0.4893616821 3:0.332713609364
20 1:0.531732713984 2:0.0967819558321 3:0.169802773309
Then I want to read the file and form a matrix in the form of :
然后我想读取文件并形成以下形式的矩阵:
[[10 0.870137474304 0.722354071782 0.671913562758 ]
[11 0.764133072717 0.4893616821 0.332713609364 ]
[20 0.531732713984 0.0967819558321 0.169802773309]]
I know how to split the elements except the first column. How to deal with the first column?
我知道如何分割除第一列以外的元素。如何处理第一列?
matrix = []
lines = open("test.txt").read().split("\n") # read all lines into an array
for line in lines:
array [0] = line.split(" ")[0]
# Split the line based on spaces and the sub-part on the colon
array = [float(s.split(":")[1]) for s in line.split(" ")]
matrix.append(array)
print(matrix)
3 个解决方案
#1
0
You can use regex:
你可以使用正则表达式:
import re
data = [map(float, re.findall('(?<=:)[\d\.]+|^\d+', i.strip('\n'))) for i in open('filename.txt')]
Output:
[[10.0, 0.870137474304, 0.722354071782, 0.671913562758], [11.0, 0.764133072717, 0.4893616821, 0.332713609364], [20.0, 0.531732713984, 0.0967819558321, 0.169802773309]]
Edit: to create a numpy
array with data
:
编辑:使用数据创建一个numpy数组:
import numpy as np
import re
data = [map(float, re.findall('(?<=:)[\d\.]+|^\d+', i.strip('\n'))) for i in open('filename.txt')]
new_data = np.array(data)
Output:
array([[ 10. , 0.87013747, 0.72235407, 0.67191356],
[ 11. , 0.76413307, 0.48936168, 0.33271361],
[ 20. , 0.53173271, 0.09678196, 0.16980277]])
#2
0
Here's one way to extract your data as a numpy
array:
这是将数据提取为numpy数组的一种方法:
df = pd.read_csv('myfile.csv', header=None)
for col in range(1, 4):
df[col] = df[col].apply(lambda x: float(x.split(':')[1]))
res = df.values
# [[ 10. 0.87013747 0.72235407 0.67191356]
# [ 11. 0.76413307 0.48936168 0.33271361]
# [ 20. 0.53173271 0.09678196 0.16980277]]
#3
0
For beginners in python
对于python中的初学者
Expressive version:
import csv
matrix = []
with open('data.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=' ')
for row in reader:
cleaned_row = [col.split(':')[-1] for col in row]
matrix.append(cleaned_row)
print matrix
Using list comprehension
使用列表理解
rows = [row for row in open('csvfile.csv').read().split('\n')]
matrix = [[col.split(':')[-1] for col in row.split(' ')] for row in rows]
#1
0
You can use regex:
你可以使用正则表达式:
import re
data = [map(float, re.findall('(?<=:)[\d\.]+|^\d+', i.strip('\n'))) for i in open('filename.txt')]
Output:
[[10.0, 0.870137474304, 0.722354071782, 0.671913562758], [11.0, 0.764133072717, 0.4893616821, 0.332713609364], [20.0, 0.531732713984, 0.0967819558321, 0.169802773309]]
Edit: to create a numpy
array with data
:
编辑:使用数据创建一个numpy数组:
import numpy as np
import re
data = [map(float, re.findall('(?<=:)[\d\.]+|^\d+', i.strip('\n'))) for i in open('filename.txt')]
new_data = np.array(data)
Output:
array([[ 10. , 0.87013747, 0.72235407, 0.67191356],
[ 11. , 0.76413307, 0.48936168, 0.33271361],
[ 20. , 0.53173271, 0.09678196, 0.16980277]])
#2
0
Here's one way to extract your data as a numpy
array:
这是将数据提取为numpy数组的一种方法:
df = pd.read_csv('myfile.csv', header=None)
for col in range(1, 4):
df[col] = df[col].apply(lambda x: float(x.split(':')[1]))
res = df.values
# [[ 10. 0.87013747 0.72235407 0.67191356]
# [ 11. 0.76413307 0.48936168 0.33271361]
# [ 20. 0.53173271 0.09678196 0.16980277]]
#3
0
For beginners in python
对于python中的初学者
Expressive version:
import csv
matrix = []
with open('data.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=' ')
for row in reader:
cleaned_row = [col.split(':')[-1] for col in row]
matrix.append(cleaned_row)
print matrix
Using list comprehension
使用列表理解
rows = [row for row in open('csvfile.csv').read().split('\n')]
matrix = [[col.split(':')[-1] for col in row.split(' ')] for row in rows]