如何根据日期时间对文本文件记录进行排序

A text file "input_msg.txt" file contains follwing records..

文本文件“input_msg.txt”包含以下记录。

Jan 1 02:32:40 other strings but may or may not unique in all those lines
Jan 1 02:32:40 other strings but may or may not unique in all those lines
Mar 31 23:31:55 other strings but may or may not unique in all those lines
Mar 31 23:31:55 other strings but may or may not unique in all those lines
Mar 31 23:31:55 other strings but may or may not unique in all those lines
Mar 31 23:31:56 other strings but may or may not unique in all those lines
Mar 31 23:31:56 other strings but may or may not unique in all those lines
Mar 31 23:31:56 other strings but may or may not unique in all those lines
Mar 31 23:31:57 other strings but may or may not unique in all those lines
Mar 31 23:31:57 other strings but may or may not unique in all those lines
Mar 31 23:31:57 other strings but may or may not unique in all those lines
Mar 31 23:31:57 other strings but may or may not unique in all those lines
Feb 1 03:52:26 other strings but may or may not unique in all those lines
Feb 1 03:52:26 other strings but may or may not unique in all those lines
Jan 1 02:46:40 other strings but may or may not unique in all those lines
Jan 1 02:44:40 other strings but may or may not unique in all those lines
Jan 1 02:40:40 other strings but may or may not unique in all those lines
Feb 10 03:52:26 other strings but may or may not unique in all those lines

1月1日02:32:40其他字符串但是在所有这些行中可能有也可能不是唯一的1月1日02:32:40其他字符串但在所有这些行中可能有也可能不唯一3月31日23:31:55其他字符串但可以或者Mar 31 23:31:55其他字符串Mar 31 31:31:55其他字符串可能也可能并不是唯一的字符Mar 31 23:31:55其他字符串但在所有这些字符串中可能有也可能不是唯一的字符3月31日23:31 :56其他字符串,但在所有这些行中可能是也可能不是唯一的Mar 31 23:31:56其他字符串但在所有这些行中可能有也可能不是唯一的Mar 31 23:31:56其他字符串但可能有也可能不是唯一的那些行3月31日23:31:57其他字符串但是在所有这些行中可能有也可能不是唯一的Mar 31 23:31:57其他字符串但在所有这些行中可能有也可能不是唯一的Mar 31 23:31:57其他字符串但是Mar 31 23:31:57其他字符串中可能有也可能不是唯一但在所有这些行中可能有也可能不是唯一的字符2月1日03:52:26其他字符串但在所有这些行中可能有也可能不唯一2月1日03 :52:26其他字符串但是在所有这些行中可能有也可能没有独特之处Jan 1 02:46:40其他字符串但在所有这些行中可能是也可能不是唯一的1月1日02:44:40其他字符串但在所有这些行中可能有也可能不唯一1月1日02:40:40其他字符串但在所有这些行中可能是也可能不是唯一的2月10日03:52:26其他字符串但在所有这些行中可能有也可能不唯一

I have tried the following program.

我试过以下程序。

def sort_file_based_timestap():    
   f = open(r"D:\Python34\test_msg.txt", "r")    
   xs = f.readlines()     
   xs.sort()  
   print (xs)
   f.close()

This program is sorting based on string.

该程序基于字符串进行排序。

I need the output like below.

我需要输出如下。

Jan 1 02:32:40 other strings but may or may not unique in all those lines
Jan 1 02:32:40 other strings but may or may not unique in all those lines
Jan 1 02:40:40 other strings but may or may not unique in all those lines
Jan 1 02:44:40 other strings but may or may not unique in all those lines
Jan 1 02:46:40 other strings but may or may not unique in all those lines
Feb 1 03:52:26 other strings but may or may not unique in all those lines
Feb 1 03:52:26 other strings but may or may not unique in all those lines
Feb 10 03:52:26 other strings but may or may not unique in all those lines
Mar 31 23:31:55 other strings but may or may not unique in all those lines
Mar 31 23:31:55 other strings but may or may not unique in all those lines
Mar 31 23:31:55 other strings but may or may not unique in all those lines
Mar 31 23:31:56 other strings but may or may not unique in all those lines
Mar 31 23:31:56 other strings but may or may not unique in all those lines
Mar 31 23:31:56 other strings but may or may not unique in all those lines
Mar 31 23:31:57 other strings but may or may not unique in all those lines
Mar 31 23:31:57 other strings but may or may not unique in all those lines
Mar 31 23:31:57 other strings but may or may not unique in all those lines
Mar 31 23:31:57 other strings but may or may not unique in all those lines

1月1日02:32:40其他字符串,但在所有这些行中可能有也可能不是唯一的1月1日02:32:40其他字符串但在所有这些行中可能有也可能不是唯一的1月1日02:40:40其他字符串但可能或Jan 1 02:44:40其他琴弦可能并不是唯一的琴弦Jan 1 02:46:40其他琴弦可能也许不是唯一的琴弦但在所有这些琴弦中可能有也可能并不独特2月1日03:52 :26个其他琴弦但在所有这些行中可能有也可能不是唯一的2月1日03:52:26其他琴弦但在所有这些行中可能有也可能不是唯一的2月10日03:52:26其他琴弦但可能有也可能不是唯一的那些行Mar 31 23:31:55其他字符串但在所有这些行中可能有也可能不是唯一的Mar 31 23:31:55其他字符串但在所有这些行中可能有也可能不是唯一的Mar 31 23:31:55其他字符串但是Mar 31 23:31:56其他字符串中可能有也可能不是唯一的字符Mar 31 23:31:56其他字符串Mar 31 23:31:56其他字符串但在所有这些字符串中可能有也可能不唯一3月31日23 :31:56其他琴弦在所有这些行中,t可能是也可能不是唯一的,但在所有这些行中可能是也可能不是唯一的,但是在所有这些行中可能是也可能不是唯一的但在所有这些行中可能有也可能不是唯一的3月31日23:31:57其他字符串但在所有这些行中可能是也可能不是唯一的Mar 31 23:31:57其他字符串但在所有这些行中可能有也可能不唯一

Your help would be appreciated!!!

你的帮助将不胜感激!

3 个解决方案

#1

The trick is to first annotate each line with a python-readable timestamp and then sorting this list of annotated lines.

诀窍是首先用python可读的时间戳注释每一行,然后对这个注释行列表进行排序。

I have put some sample code below:

我在下面放了一些示例代码:

import time
import re

def parse_line(line):
    """
    Parses each line to split line into the timestamp and the rest
    """

    line = line.rstrip()
    m = re.match(r"(\w{3}\s+\d+\s+[0-9:]+)\s+(.*)", line)
    if m:
        timestamp = time.strptime(m.group(1), "%b %d %H:%M:%S")
        return (timestamp, line)


def main():
    f = open('input_msg.txt', 'r')
    lines = []
    for line in f:
        parsed = parse_line(line)
        if parsed:
            lines.append(parsed)
    # sort the array based on the first element of each tuple
    # which is the parsed time
    sorted_lines  = sorted(lines, key=lambda annotated_line: annotated_line[0])
    for l in sorted_lines:
        print l[1]

if __name__ == "__main__":
    main()

#2

Use a (month, day, rest) triple as sorting key, with month and day properly parsed and thus comparing correctly.

使用(月,日,休息)三元组作为排序键,正确解析月和日,从而正确比较。

import time
def dater(line):
    month, day, rest = line.split(' ', 2)
    return (time.strptime(month, '%b'), int(day), rest)

with open('input_msg.txt') as file:
    for line in sorted(file, key=dater):
        print(line, end='')

#3

How about this?

这个怎么样?

You first take the text and convert it into a list using splitlines() Now, each entry of this list is a string. We can't sort these chunks of strings. So, next, you take the strings and convert them into list using split() Now, your log file has been converted into a list of lists You can now parse this "list of lists" using a custom key function.

首先使用splitlines将文本转换为列表()现在,此列表的每个条目都是一个字符串。我们无法对这些字符串进行排序。因此,接下来,您使用split()将字符串转换为列表现在,您的日志文件已转换为列表列表您现在可以使用自定义键功能解析此“列表列表”。

Here is the code to do so -

这是代码 -

# log text
log = """Jan 1 02:32:40 other strings but may or may not unique in all those lines
    Jan 1 02:32:40 other strings but may or may not unique in all those lines
    Mar 31 23:31:55 other strings but may or may not unique in all those lines
    Mar 31 23:31:55 other strings but may or may not unique in all those lines
    Mar 31 23:31:55 other strings but may or may not unique in all those lines
    Mar 31 23:31:56 other strings but may or may not unique in all those lines
    Mar 31 23:31:56 other strings but may or may not unique in all those lines
    Mar 31 23:31:56 other strings but may or may not unique in all those lines
    Mar 31 23:31:57 other strings but may or may not unique in all those lines
    Mar 31 23:31:57 other strings but may or may not unique in all those lines
    Mar 31 23:31:57 other strings but may or may not unique in all those lines
    Mar 31 23:31:57 other strings but may or may not unique in all those lines
    Feb 1 03:52:26 other strings but may or may not unique in all those lines
    Feb 1 03:52:26 other strings but may or may not unique in all those lines
    Jan 1 02:46:40 other strings but may or may not unique in all those lines
    Jan 1 02:44:40 other strings but may or may not unique in all those lines
    Jan 1 02:40:40 other strings but may or may not unique in all those lines
    Feb 10 03:52:26 other strings but may or may not unique in all those lines"""

# convert the log into a list of strings
lines = log.splitlines()
'''initialize temp list that will store the log as a "list of lists" which can be sorted easily'''
temp_list = []
for data in lines:
    temp_list.append(data.split())


# writing the method which will be fed as a key for sorting
def convert_time(logline):
    # extracting hour, minute and second from each log entry
    h, m, s = map(int, logline[2].split(':'))
    time_in_seconds = h * 3600 + m * 60 + s
    return time_in_seconds


sorted_log_list = sorted(temp_list, key=convert_time)

''' sorted_log_list is a "list of lists". Each list within it is a representation of one log entry. We will use print and join to print it out as a readable log entry'''
for lines in sorted_log_list:
    print " ".join(lines)

Here is a more efficient version of the above code, here we don't need to create a temp_list and simply write a function that works on the strings that are generated as a result of splitlines() -

这是上面代码的一个更有效的版本,这里我们不需要创建一个temp_list,只需编写一个函数来处理由splitlines()生成的字符串 -

# convert the log into a list of strings
lines = log.splitlines()

# writing the method which will be fed as a key for sorting
def convert_time(logline):
    # extracting hour, minute and second from each log entry
    h, m, s = map(int, logline.split()[2].split(':'))
    time_in_seconds = h * 3600 + m * 60 + s
    return time_in_seconds


sorted_log_list = sorted(lines, key=convert_time)

''' sorted_log_list is a "list of lists". Each list within it is a representation of one log entry. We will use print and join to print it out as a readable log entry'''
for lines in sorted_log_list:
    print lines

#1