I am writing a script that goes into our machine and parses a trace file which is a txt file. I am grepping for a particular value, in this example, "RP", and creating a dataframe from that data. Now I have all these rows, but no columns. I would want to split in columns. Here is how it looks like after the grep.
我正在编写一个脚本,该脚本进入我们的机器并解析一个跟踪文件,它是一个txt文件。在本例中,我正在获取一个特定的值“RP”,并从该数据创建一个dataframe。现在我有所有这些行,但是没有列。我想把它分成几列。这是在grep之后的样子。
1 2016-03-14 09:52:38> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0004
1 2016-03-14 09:52:38 >麦蓝®明星:固件命令(一步)——完成;> RP:+ 0004卢比
What I would want is... Date Pressure 2016-03-14 09:52:38 rp+0004
我想要的是……日期压力2016-03-14 09:52:38 rp+0004
options(warn=-1)
#Select Copy From Dir, Change \\ to /
copyfrom<-gsub("\\\\","/",choose.dir(default = "", caption = "Select folder you wish to copy files from"))
#File names
listfiles<-list.files(copyfrom)
#Total amount of files
totalfiles=length(listfiles)
#Select Copy To Dir, Change \\ to /
copyto<-gsub("\\\\","/",choose.dir(default = "", caption = "Select folder you wish to copy files to"))
#Loop through all files in direct
for (totalfiles in 1:totalfiles)
{
#Opening the file based on how many files present
con <- file(paste0(copyfrom,"/",listfiles[totalfiles]))
#open connection to file
open(con);
#read file
read <- readLines(con)
#search file for particular value
searched_entries = grep("RP", read, value = T)
#write file, remove .trc from file name and add _parsed
writeLines(searched_entries, con = paste0(copyto,"/",gsub(".trc","",listfiles[totalfiles]),"_parsed.txt"))
#close connection and print total files parsed
close(con)
print(totalfiles)
}
Here is the data frame:
这是数据框架:
2016-03-14 09:52:38> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0004 2016-03-14 09:52:39> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:39> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:39> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 etc..
2016-03-14 09:52:38 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0004 0004 09:52:39 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:39 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:39 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000等。
I would like to end up with a 2 columns, one with the date (2016-03-14 09:52:3) the other with RP number (rp+0000) Let me know if you would like me to clarify further.
最后我想写两篇专栏文章,一篇是日期(2016-03-14 09:52:3),另一篇是RP编号(RP +0000),如果你想让我进一步澄清一下。
Here is the Trace file. You can copy paste this into notepad and save it as a .txt file Name of file: StarLineDailyMaintenance_8715f3804819481aae1cae3a479556aa_Trace.trc
这是跟踪文件。您可以将它复制到记事本中,并将其保存为文件的.txt文件名称:starlinedailymaintenance_8715f3804819481cae3a479556aa_trace.trc。
2016-03-14 09:52:38> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0004 2016-03-14 09:52:39> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:39> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:39> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:40> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:40> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:40> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:41> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:41> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:41> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:42> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:42> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:42> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:43> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:43> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:43> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:44> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:44> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:45> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:45> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:45> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:46> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:46> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:46> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:47> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:47> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:47> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:48> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:48> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:48> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000 2016-03-14 09:52:49> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+4067 2016-03-14 09:52:50> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+4057 2016-03-14 09:52:50> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+4028 2016-03-14 09:52:51> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+4057 2016-03-14 09:52:52> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+4082 2016-03-14 09:52:52> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+4125 2016-03-14 09:52:53> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+4082
2016-03-14 09:52:38 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0004 0004 09:52:39 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:39 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:39 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:40 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:40 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:40 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:41 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:41 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:41 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:42 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:42 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:42 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:43 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:43 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:43 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:44 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:44 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:45 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:45 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:45 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:46 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:46 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:46 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:47 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:47 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:47 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:48 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:48 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:48 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 0000 0000 09:52:49 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 4067 4067 09:52:50 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 4057 4057 09:52:50 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 4028 4028 09:52:51 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 4057 4057 09:52:52 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 4082 4082 09:52:52 >麦蓝®明星:固件命令(一步)——完成;> RP:RP + 4125 4125 09:52:53 >麦蓝®明星:固件命令(一步)——完成;> RP:+ 4082卢比
1 个解决方案
#1
0
You can use str_match from stringr with a regex to parse the lines;
可以使用stringr中的str_match和regex来解析这些行;
> library(stringr)
> df
# V1
# 1 2016-03-14 09:52:38> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0004
# 2 2016-03-14 09:52:39> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000
# 3 2016-03-14 09:52:39> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000
# 4 2016-03-14 09:52:39> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000
> f <- as.data.frame(str_match(df$V1, '^([^>]*)>[^>]*> RP: ([^ ]*) *$')[,-1])
> colnames(f) = c('Date', 'Pressure')
> f
# Date Pressure
# 1 2016-03-14 09:52:38 rp+0004
# 2 2016-03-14 09:52:39 rp+0000
# 3 2016-03-14 09:52:39 rp+0000
# 4 2016-03-14 09:52:39 rp+0000
The somewhat complex looking regex basically grabs everything up to the first >
to column 1, and everything efter > RP:
to column 2.
看起来有点复杂的regex基本上可以将所有东西都捕获到第一个>到第1列,以及所有efter > RP:到第2列。
Assuming the file is line feed separated with the format given, you could also just parse the data straight from the file using read.pattern
from gsubfn
with the same regex;
假设该文件是与给定格式分隔的行提要,您还可以使用read直接从文件解析数据。使用相同regex的gsubfn模式;
> library(gsubfn)
> f = read.pattern('Test/test.txt', '^([^>]*)>[^>]*> RP: ([^ ]*) *$')
> colnames(f) = c('Date', 'Pressure')
> f
# Date Pressure
# 1 2016-03-14 09:52:38 rp+0004
# 2 2016-03-14 09:52:39 rp+0000
# 3 2016-03-14 09:52:39 rp+0000
# 4 2016-03-14 09:52:39 rp+0000
#1
0
You can use str_match from stringr with a regex to parse the lines;
可以使用stringr中的str_match和regex来解析这些行;
> library(stringr)
> df
# V1
# 1 2016-03-14 09:52:38> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0004
# 2 2016-03-14 09:52:39> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000
# 3 2016-03-14 09:52:39> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000
# 4 2016-03-14 09:52:39> Microlab® STAR : Firmware Command (Single Step) - complete; > RP: rp+0000
> f <- as.data.frame(str_match(df$V1, '^([^>]*)>[^>]*> RP: ([^ ]*) *$')[,-1])
> colnames(f) = c('Date', 'Pressure')
> f
# Date Pressure
# 1 2016-03-14 09:52:38 rp+0004
# 2 2016-03-14 09:52:39 rp+0000
# 3 2016-03-14 09:52:39 rp+0000
# 4 2016-03-14 09:52:39 rp+0000
The somewhat complex looking regex basically grabs everything up to the first >
to column 1, and everything efter > RP:
to column 2.
看起来有点复杂的regex基本上可以将所有东西都捕获到第一个>到第1列,以及所有efter > RP:到第2列。
Assuming the file is line feed separated with the format given, you could also just parse the data straight from the file using read.pattern
from gsubfn
with the same regex;
假设该文件是与给定格式分隔的行提要,您还可以使用read直接从文件解析数据。使用相同regex的gsubfn模式;
> library(gsubfn)
> f = read.pattern('Test/test.txt', '^([^>]*)>[^>]*> RP: ([^ ]*) *$')
> colnames(f) = c('Date', 'Pressure')
> f
# Date Pressure
# 1 2016-03-14 09:52:38 rp+0004
# 2 2016-03-14 09:52:39 rp+0000
# 3 2016-03-14 09:52:39 rp+0000
# 4 2016-03-14 09:52:39 rp+0000