从csv中删除冗余行,其中python中的特定列值相同

时间:2021-12-27 18:13:12

I have a csv file similar to below representation:

我有一个类似于以下表示的csv文件:

**Number,Timestamp,Value1,value2,Value3,Value4**

7680.0,2015-05-06 13:53:07,4.695,7.929,,

7680.0,2015-05-06 13:53:07,,,4.4118,7.8514

7681.0,2015-05-06 21:25:11,4.259,7.924,,

7681.0,2015-05-06 21:25:11,,,4.477,7.6178

I need to convert this file in below format:

我需要以下面的格式转换这个文件:

**Number,Timestamp,Value1,value2,Value3,Value4**

7680.0,2015-05-06 13:53:07,4.695,7.929,4.4118,7.8514


7681.0,2015-05-06 21:25:11,4.259,7.924,4.477,7.6178

I am new to python 2.

我是python 2的新手。

3 个解决方案

#1


import pandas as pd
df = pd.read_csv('filename.csv')
df_group = df.groupby(['Number','Timestamp']).sum()

Groupby function will group your dataset by Number and Timestamp. Then sum() will sum all numeric columns. I hope this is what your looking for.

Groupby函数将按数字和时间戳对数据集进行分组。然后sum()将汇总所有数字列。我希望这是你想要的。

#2


Probably not the best solution, but this will get it done:

可能不是最好的解决方案,但这将完成它:

with open('messed_up.csv', 'r') as r and open('new.csv', 'w') as f:
   simValues = []
   for line in r:
       line = line.replace(',,','')
       line = line.split(',,,','')
       try:
           fOne, fTwo, fThree, fFour, fFive, fSix = line.split(',')
           if fOne not in simValues:
               simValues.append(fOne)
               f.write(line)
           else:
               print "[-] " + line + " was detected as similar"
       except Exception as e:
           print "[-] Error : " + str(e)

#3


This can be easily handled by pandas

这可以通过熊猫轻松处理

import pandas as pd
df = pd.read_csv("file1.csv", header=0, index_col=["**Number", "Timestamp"])
dfnew = df.groupby(df.index).sum()
dfnew.to_csv("file2.csv")

#1


import pandas as pd
df = pd.read_csv('filename.csv')
df_group = df.groupby(['Number','Timestamp']).sum()

Groupby function will group your dataset by Number and Timestamp. Then sum() will sum all numeric columns. I hope this is what your looking for.

Groupby函数将按数字和时间戳对数据集进行分组。然后sum()将汇总所有数字列。我希望这是你想要的。

#2


Probably not the best solution, but this will get it done:

可能不是最好的解决方案,但这将完成它:

with open('messed_up.csv', 'r') as r and open('new.csv', 'w') as f:
   simValues = []
   for line in r:
       line = line.replace(',,','')
       line = line.split(',,,','')
       try:
           fOne, fTwo, fThree, fFour, fFive, fSix = line.split(',')
           if fOne not in simValues:
               simValues.append(fOne)
               f.write(line)
           else:
               print "[-] " + line + " was detected as similar"
       except Exception as e:
           print "[-] Error : " + str(e)

#3


This can be easily handled by pandas

这可以通过熊猫轻松处理

import pandas as pd
df = pd.read_csv("file1.csv", header=0, index_col=["**Number", "Timestamp"])
dfnew = df.groupby(df.index).sum()
dfnew.to_csv("file2.csv")