In my binary to text decoding application (.NET 2.0) I found that the line:
在我的二进制文本解码应用程序(.NET 2.0)中,我发现了这一行:
logEntryTime.ToString("dd.MM.yy HH:mm:ss:fff")
takes 33% of total processing time. Does anyone have any ideas on how to make it faster?
占总处理时间的33%。有没有人对如何加快速度有任何想法?
EDIT: This app is used to process some binary logs and it currently takes 15 hours to run. So 1/3 of this will be 5 hours.
编辑:此应用程序用于处理一些二进制日志,它目前需要15个小时才能运行。所以1/3将是5个小时。
EDIT: I am using NProf for profiling. App is processing around 17 GBytes of binary logs.
编辑:我正在使用NProf进行分析。应用程序正在处理大约17 GB的二进制日志。
4 个解决方案
#1
It's unfortunate that .NET doesn't have a sort of "formatter" type which can parse a pattern and remember it.
遗憾的是,.NET没有一种“格式化程序”类型,它可以解析模式并记住它。
If you're always using the same format, you might want to hand-craft a formatter to do exactly that. Something along the lines of:
如果您总是使用相同的格式,您可能希望手工制作格式化程序来完成相同的操作。有点像:
public static string FormatDateTime(DateTime dt)
{
char[] chars = new char[21];
Write2Chars(chars, 0, dt.Day);
chars[2] = '.';
Write2Chars(chars, 3, dt.Month);
chars[5] = '.';
Write2Chars(chars, 6, dt.Year % 100);
chars[8] = ' ';
Write2Chars(chars, 9, dt.Hour);
chars[11] = ' ';
Write2Chars(chars, 12, dt.Minute);
chars[14] = ' ';
Write2Chars(chars, 15, dt.Second);
chars[17] = ' ';
Write2Chars(chars, 18, dt.Millisecond / 10);
chars[20] = Digit(dt.Millisecond % 10);
return new string(chars);
}
private static void Write2Chars(char[] chars, int offset, int value)
{
chars[offset] = Digit(value / 10);
chars[offset+1] = Digit(value % 10);
}
private static char Digit(int value)
{
return (char) (value + '0');
}
This is pretty ugly, but it's probably a lot more efficient... benchmark it, of course!
这非常难看,但它可能效率更高......当然,基准吧!
#2
Are you sure it takes 33% of the time? How have you measured that? It sounds more than a little suspicious to me...
你确定需要33%的时间吗?你怎么测量的?这对我来说听起来有点可疑......
This makes things a little bit quicker:
这使得事情变得更快一些:
Basic: 2342ms
Custom: 1319ms
Or if we cut out the IO (Stream.Null
):
或者如果我们切出IO(Stream.Null):
Basic: 2275ms
Custom: 839ms
using System.Diagnostics;
using System;
using System.IO;
static class Program
{
static void Main()
{
DateTime when = DateTime.Now;
const int LOOP = 1000000;
Stopwatch basic = Stopwatch.StartNew();
using (TextWriter tw = new StreamWriter("basic.txt"))
{
for (int i = 0; i < LOOP; i++)
{
tw.Write(when.ToString("dd.MM.yy HH:mm:ss:fff"));
}
}
basic.Stop();
Console.WriteLine("Basic: " + basic.ElapsedMilliseconds + "ms");
char[] buffer = new char[100];
Stopwatch custom = Stopwatch.StartNew();
using (TextWriter tw = new StreamWriter("custom.txt"))
{
for (int i = 0; i < LOOP; i++)
{
WriteDateTime(tw, when, buffer);
}
}
custom.Stop();
Console.WriteLine("Custom: " + custom.ElapsedMilliseconds + "ms");
}
static void WriteDateTime(TextWriter output, DateTime when, char[] buffer)
{
buffer[2] = buffer[5] = '.';
buffer[8] = ' ';
buffer[11] = buffer[14] = buffer[17] = ':';
Write2(buffer, when.Day, 0);
Write2(buffer, when.Month, 3);
Write2(buffer, when.Year % 100, 6);
Write2(buffer, when.Hour, 9);
Write2(buffer, when.Minute, 12);
Write2(buffer, when.Second, 15);
Write3(buffer, when.Millisecond, 18);
output.Write(buffer, 0, 21);
}
static void Write2(char[] buffer, int value, int offset)
{
buffer[offset++] = (char)('0' + (value / 10));
buffer[offset] = (char)('0' + (value % 10));
}
static void Write3(char[] buffer, int value, int offset)
{
buffer[offset++] = (char)('0' + (value / 100));
buffer[offset++] = (char)('0' + ((value / 10) % 10));
buffer[offset] = (char)('0' + (value % 10));
}
}
#3
This is not an answer in itself, but rather an addedum to Jon Skeet's execellent answer, offering a variant for the "s" (ISO) format:
这本身不是一个答案,而是Jon Skeet的优秀答案的补充,提供了“s”(ISO)格式的变体:
/// <summary>
/// Implements a fast method to write a DateTime value to string, in the ISO "s" format.
/// </summary>
/// <param name="dateTime">The date time.</param>
/// <returns></returns>
/// <devdoc>
/// This implementation exists just for performance reasons, it is semantically identical to
/// <code>
/// text = value.HasValue ? value.Value.ToString("s") : string.Empty;
/// </code>
/// However, it runs about 3 times as fast. (Measured using the VS2015 performace profiler)
/// </devdoc>
public static string ToIsoStringFast(DateTime? dateTime) {
if (!dateTime.HasValue) {
return string.Empty;
}
DateTime dt = dateTime.Value;
char[] chars = new char[19];
Write4Chars(chars, 0, dt.Year);
chars[4] = '-';
Write2Chars(chars, 5, dt.Month);
chars[7] = '-';
Write2Chars(chars, 8, dt.Day);
chars[10] = 'T';
Write2Chars(chars, 11, dt.Hour);
chars[13] = ':';
Write2Chars(chars, 14, dt.Minute);
chars[16] = ':';
Write2Chars(chars, 17, dt.Second);
return new string(chars);
}
With the 4 digit serializer as:
使用4位数序列化器:
private static void Write4Chars(char[] chars, int offset, int value) {
chars[offset] = Digit(value / 1000);
chars[offset + 1] = Digit(value / 100 % 10);
chars[offset + 2] = Digit(value / 10 % 10);
chars[offset + 3] = Digit(value % 10);
}
This runs about 3 times as fast. (Measured using the VS2015 performance profiler)
这大约快3倍。 (使用VS2015性能分析器测量)
#4
Do you know how big each record in the binary and text logs are going to be? If so you can split the processing of the log file across a number of threads which would give better use of a multi core/processor PC. If you don't mind the result being in separate files it would be a good idea to have one hard disk per core that way you will reduce the amount the disk heads have to move.
你知道二进制文本和文本日志中的每条记录有多大吗?如果是这样,您可以跨多个线程拆分日志文件的处理,这样可以更好地使用多核/处理器PC。如果你不介意将结果放在单独的文件中,那么每个核心有一个硬盘是个好主意,这样你就可以减少磁头移动的数量。
#1
It's unfortunate that .NET doesn't have a sort of "formatter" type which can parse a pattern and remember it.
遗憾的是,.NET没有一种“格式化程序”类型,它可以解析模式并记住它。
If you're always using the same format, you might want to hand-craft a formatter to do exactly that. Something along the lines of:
如果您总是使用相同的格式,您可能希望手工制作格式化程序来完成相同的操作。有点像:
public static string FormatDateTime(DateTime dt)
{
char[] chars = new char[21];
Write2Chars(chars, 0, dt.Day);
chars[2] = '.';
Write2Chars(chars, 3, dt.Month);
chars[5] = '.';
Write2Chars(chars, 6, dt.Year % 100);
chars[8] = ' ';
Write2Chars(chars, 9, dt.Hour);
chars[11] = ' ';
Write2Chars(chars, 12, dt.Minute);
chars[14] = ' ';
Write2Chars(chars, 15, dt.Second);
chars[17] = ' ';
Write2Chars(chars, 18, dt.Millisecond / 10);
chars[20] = Digit(dt.Millisecond % 10);
return new string(chars);
}
private static void Write2Chars(char[] chars, int offset, int value)
{
chars[offset] = Digit(value / 10);
chars[offset+1] = Digit(value % 10);
}
private static char Digit(int value)
{
return (char) (value + '0');
}
This is pretty ugly, but it's probably a lot more efficient... benchmark it, of course!
这非常难看,但它可能效率更高......当然,基准吧!
#2
Are you sure it takes 33% of the time? How have you measured that? It sounds more than a little suspicious to me...
你确定需要33%的时间吗?你怎么测量的?这对我来说听起来有点可疑......
This makes things a little bit quicker:
这使得事情变得更快一些:
Basic: 2342ms
Custom: 1319ms
Or if we cut out the IO (Stream.Null
):
或者如果我们切出IO(Stream.Null):
Basic: 2275ms
Custom: 839ms
using System.Diagnostics;
using System;
using System.IO;
static class Program
{
static void Main()
{
DateTime when = DateTime.Now;
const int LOOP = 1000000;
Stopwatch basic = Stopwatch.StartNew();
using (TextWriter tw = new StreamWriter("basic.txt"))
{
for (int i = 0; i < LOOP; i++)
{
tw.Write(when.ToString("dd.MM.yy HH:mm:ss:fff"));
}
}
basic.Stop();
Console.WriteLine("Basic: " + basic.ElapsedMilliseconds + "ms");
char[] buffer = new char[100];
Stopwatch custom = Stopwatch.StartNew();
using (TextWriter tw = new StreamWriter("custom.txt"))
{
for (int i = 0; i < LOOP; i++)
{
WriteDateTime(tw, when, buffer);
}
}
custom.Stop();
Console.WriteLine("Custom: " + custom.ElapsedMilliseconds + "ms");
}
static void WriteDateTime(TextWriter output, DateTime when, char[] buffer)
{
buffer[2] = buffer[5] = '.';
buffer[8] = ' ';
buffer[11] = buffer[14] = buffer[17] = ':';
Write2(buffer, when.Day, 0);
Write2(buffer, when.Month, 3);
Write2(buffer, when.Year % 100, 6);
Write2(buffer, when.Hour, 9);
Write2(buffer, when.Minute, 12);
Write2(buffer, when.Second, 15);
Write3(buffer, when.Millisecond, 18);
output.Write(buffer, 0, 21);
}
static void Write2(char[] buffer, int value, int offset)
{
buffer[offset++] = (char)('0' + (value / 10));
buffer[offset] = (char)('0' + (value % 10));
}
static void Write3(char[] buffer, int value, int offset)
{
buffer[offset++] = (char)('0' + (value / 100));
buffer[offset++] = (char)('0' + ((value / 10) % 10));
buffer[offset] = (char)('0' + (value % 10));
}
}
#3
This is not an answer in itself, but rather an addedum to Jon Skeet's execellent answer, offering a variant for the "s" (ISO) format:
这本身不是一个答案,而是Jon Skeet的优秀答案的补充,提供了“s”(ISO)格式的变体:
/// <summary>
/// Implements a fast method to write a DateTime value to string, in the ISO "s" format.
/// </summary>
/// <param name="dateTime">The date time.</param>
/// <returns></returns>
/// <devdoc>
/// This implementation exists just for performance reasons, it is semantically identical to
/// <code>
/// text = value.HasValue ? value.Value.ToString("s") : string.Empty;
/// </code>
/// However, it runs about 3 times as fast. (Measured using the VS2015 performace profiler)
/// </devdoc>
public static string ToIsoStringFast(DateTime? dateTime) {
if (!dateTime.HasValue) {
return string.Empty;
}
DateTime dt = dateTime.Value;
char[] chars = new char[19];
Write4Chars(chars, 0, dt.Year);
chars[4] = '-';
Write2Chars(chars, 5, dt.Month);
chars[7] = '-';
Write2Chars(chars, 8, dt.Day);
chars[10] = 'T';
Write2Chars(chars, 11, dt.Hour);
chars[13] = ':';
Write2Chars(chars, 14, dt.Minute);
chars[16] = ':';
Write2Chars(chars, 17, dt.Second);
return new string(chars);
}
With the 4 digit serializer as:
使用4位数序列化器:
private static void Write4Chars(char[] chars, int offset, int value) {
chars[offset] = Digit(value / 1000);
chars[offset + 1] = Digit(value / 100 % 10);
chars[offset + 2] = Digit(value / 10 % 10);
chars[offset + 3] = Digit(value % 10);
}
This runs about 3 times as fast. (Measured using the VS2015 performance profiler)
这大约快3倍。 (使用VS2015性能分析器测量)
#4
Do you know how big each record in the binary and text logs are going to be? If so you can split the processing of the log file across a number of threads which would give better use of a multi core/processor PC. If you don't mind the result being in separate files it would be a good idea to have one hard disk per core that way you will reduce the amount the disk heads have to move.
你知道二进制文本和文本日志中的每条记录有多大吗?如果是这样,您可以跨多个线程拆分日志文件的处理,这样可以更好地使用多核/处理器PC。如果你不介意将结果放在单独的文件中,那么每个核心有一个硬盘是个好主意,这样你就可以减少磁头移动的数量。