以上为文本格式:
要对coverage排序,并且将coverage中的空值去掉
1.先将文本处理好,并将结果写入txt文件:
# -*- coding:utf-8 -*- import re file = open('month5.txt',encoding='utf-8') #写文件 outdata = open('test5.txt', 'w', encoding="utf-8") # 以|为分隔符,从而将这个文件以行为单位进行分割 count = 0#计算总数 value = re.compile(r'^[-+]?[0-9]+\.[0-9]+$') for line in file.readlines(): #print(line.strip()) line_context_split = line.strip().split('|') #将0 不要 只要1和2 if len(line_context_split)>3: if value.match(line_context_split[2].strip()) and line_context_split[1] != '': #print(line_context_split[2]) count += float(line_context_split[2]) outdata.write(str(str(line_context_split[1]).strip()+'\t'+line_context_split[2]+'\n')) print(count)
要点:判断字符串是否是小数点,用正则表达式
2.对所处理的文本进行排序,用pandas的sort_values,写入csv的文件
import pandas as pd data = pd.read_table('test5.txt',encoding='utf-8',sep='\t',header=None) names = ['keywords','average'] data.columns = names #print(data) df = pd.DataFrame(data) #print(df) df01 = df.sort_values(['average'], ascending=[False]) print(df01) df01.to_csv('test5.csv')