Slightly bizarre request, I know, but bear with me.
我知道,有点奇怪的要求,但要忍受我。
I have an Excel spreadsheet with some logging data taken from a highly parallelised bit of server-side code. I'm trying to analyse it for where there may be gaps in the logs, indicating tasks that should be logged but aren't; but because it's a serial, timestamp-order list of a dozen or so parallel threads it's quite hard to read. So I had the unorthodox idea of using a Gantt chart to visualise the overlapping tasks. Excel is terrible at this, so I started looking at alternative tools, and I thought of trying R.
我有一个Excel电子表格,其中包含从高度并行化的服务器端代码中获取的一些日志记录数据。我正在尝试分析它在日志中可能存在间隙的地方,指出应该记录但不是的任务;但由于它是十几个并行线程的串行,时间戳顺序列表,因此很难阅读。因此,我有一个非正统的想法,即使用甘特图来显示重叠的任务。 Excel在这方面很糟糕,所以我开始寻找替代工具,我想到了尝试R.
Each task in the log has a start timestamp, and end timestamp, and a duration, so I have the data that I need. I read this SO post and mutilated the example into this R script:
日志中的每个任务都有一个开始时间戳,结束时间戳和持续时间,所以我有我需要的数据。我读了这篇SO帖子,并将这个例子删到了这个R脚本中:
tasks <- c("Task1", "Task2")
dfr <- data.frame(
name = factor(tasks, levels = tasks),
start.date = c("07/08/2013 09:03:25.815", "07/08/2013 09:03:25.956"),
end.date = c("07/08/2013 09:03:28.300", "07/08/2013 09:03:30.409"),
is.critical = c(TRUE, TRUE)
)
mdfr <- melt(dfr, measure.vars = c("start.date", "end.date"))
ggplot(mdfr, aes(as.Date(value, "%d/%m/%Y %H:%M:%OS"), name, colour = is.critical)) +
geom_line(size = 6) +
xlab("") + ylab("") +
theme_bw()
This doesn't work, though -- it doesn't plot any data, and the time axis is all messed up. I suspect (unsurprisingly) that plotting sub-second Gantt charts is a weird thing to do. I'm a complete R newbie (although I've been looking for an excuse to try it out for ages) -- is there any simple way to make this work?
但这不起作用 - 它不绘制任何数据,时间轴全部搞砸了。我怀疑(毫不奇怪)绘制亚秒级甘特图是一件奇怪的事情。我是一个完整的R新手(虽然我一直在找借口尝试它多年) - 有没有简单的方法来使这项工作?
1 个解决方案
#1
1
First, your time should be in POSIXct
format not Date
as it contains also hours and minutes. You can add new column to your melted dataframe with correct format.
首先,您的时间应该是POSIXct格式而不是日期,因为它还包含小时和分钟。您可以使用正确的格式向熔化的数据框添加新列。
mdfr$time<-as.POSIXct(strptime(mdfr$value, "%d/%m/%Y %H:%M:%OS")
)
mdfr $ time <-as.POSIXct(strptime(mdfr $ value,“%d /%m /%Y%H:%M:%OS”))
Then with scale_x_datetime()
you can control where the breaks will be on axis. For the x values use new column with correct format.
然后使用scale_x_datetime(),您可以控制断点在轴上的位置。对于x值,请使用格式正确的新列。
library(scales)
ggplot(mdfr, aes(time,name, colour = is.critical)) +
geom_line(size = 6) +
xlab("") + ylab("") +
theme_bw()+
scale_x_datetime(breaks=date_breaks("2 sec"))
#1
1
First, your time should be in POSIXct
format not Date
as it contains also hours and minutes. You can add new column to your melted dataframe with correct format.
首先,您的时间应该是POSIXct格式而不是日期,因为它还包含小时和分钟。您可以使用正确的格式向熔化的数据框添加新列。
mdfr$time<-as.POSIXct(strptime(mdfr$value, "%d/%m/%Y %H:%M:%OS")
)
mdfr $ time <-as.POSIXct(strptime(mdfr $ value,“%d /%m /%Y%H:%M:%OS”))
Then with scale_x_datetime()
you can control where the breaks will be on axis. For the x values use new column with correct format.
然后使用scale_x_datetime(),您可以控制断点在轴上的位置。对于x值,请使用格式正确的新列。
library(scales)
ggplot(mdfr, aes(time,name, colour = is.critical)) +
geom_line(size = 6) +
xlab("") + ylab("") +
theme_bw()+
scale_x_datetime(breaks=date_breaks("2 sec"))