如何改进/替换sprintf,我已经将其度量为性能热点?

时间:2021-03-05 03:22:52

Through profiling I've discovered that the sprintf here takes a long time. Is there a better performing alternative that still handles the leading zeros in the y/m/d h/m/s fields?

通过剖析,我发现sprintf在这里花费了很长时间。是否还有更好的替代方法可以处理y/m/d h/m/s字段中的前导零?

SYSTEMTIME sysTime;
GetLocalTime( &sysTime );
char buf[80];
for (int i = 0; i < 100000; i++)
{

    sprintf(buf, "%4d-%02d-%02d %02d:%02d:%02d",
        sysTime.wYear, sysTime.wMonth, sysTime.wDay, 
        sysTime.wHour, sysTime.wMinute, sysTime.wSecond);

}

Note: The OP explains in the comments that this is a stripped-down example. The "real" loop contains additional code that uses varying time values from a database. Profiling has pinpointed sprintf() as the offender.

注意:OP在注释中解释这是一个简化的示例。“真实”循环包含使用来自数据库的不同时间值的其他代码。剖析已确定sprintf()为违法者。

12 个解决方案

#1


19  

If you were writing your own function to do the job, a lookup table of the string values of 0 .. 61 would avoid having to do any arithmetic for everything apart from the year.

如果您正在编写自己的函数来完成这项工作,那么将创建一个字符串值为0的查找表。61可以避免在一年之外做任何计算。

edit: Note that to cope with leap seconds (and to match strftime()) you should be able to print seconds values of 60 and 61.

编辑:注意,要处理闰秒(和匹配strftime())),您应该能够打印60和61的秒值。

char LeadingZeroIntegerValues[62][] = { "00", "01", "02", ... "59", "60", "61" };

Alternatively, how about strftime()? I've no idea how the performance compares (it could well just be calling sprintf()), but it's worth looking at (and it could be doing the above lookup itself).

另外,strftime()怎么样?我不知道性能是如何比较的(它很可能只是调用sprintf())),但是它值得一看(它可能正在执行上面的查找)。

#2


6  

You could try filling each char in the output in turn.

您可以尝试依次填充输出中的每个字符。

buf[0] = (sysTime.wYear / 1000) % 10 + '0' ;
buf[1] = (sysTime.wYear / 100) % 10 + '0';
buf[2] = (sysTime.wYear / 10) % 10 + '0';
buf[3] = sysTime.wYear % 10 + '0';
buf[4] = '-';

... etc...

…等等……

Not pretty, but you get the picture. If nothing else, it may help explain why sprintf isn't going to be that fast.

不漂亮,但你懂的。如果没有别的原因,这或许可以解释为什么斯普林特夫不会那么快。

OTOH, maybe you could cache the last result. That way you'd only need to generate one every second.

OTOH,也许你可以缓存最后的结果。这样,你只需要每秒生成一个。

#3


6  

Printf needs to deal with a lot of different formats. You certainly could grab the source for printf and use it as a basis to roll your own version that deals specifically with the sysTime structure. That way you pass in one argument, and it does just exactly the work that needs to be done and nothing more.

Printf需要处理许多不同的格式。当然,您可以获取printf的源代码,并将其作为您自己的版本的基础,该版本专门处理sysTime结构。这样你就传递了一个论点,它只做了需要做的工作,仅此而已。

#4


3  

What do you mean by a "long" time -- since the sprintf() is the only statement in your loop and the "plumbing" of the loop (increment, comparison) is negligible, the sprintf() has to consume the most time.

您所说的“长”时间是什么意思?因为sprintf()是您的循环中惟一的语句,并且循环的“管道”(增量、比较)可以忽略,所以sprintf()必须消耗最多的时间。

Remember the old joke about the man who lost his wedding ring on 3rd Street one night, but looked for it on 5th because the light was brighter there? You've built an example that's designed to "prove" your assumption that sprintf() is ineffecient.

还记得那个老笑话吗?有一天晚上,一个男人在第三大街丢了结婚戒指,却在第5街去找,因为那里的灯光更明亮。您已经构建了一个旨在“证明”您的假定sprintf()是无效的示例。

Your results will be more accurate if you profile "actual" code that contains sprintf() in addition to all the other functions and algorithms you use. Alternatively, try writing your own version that addresses the specific zero-padded numeric conversion that you require.

如果您对包含sprintf()的“实际”代码进行概要分析,并且使用所有其他函数和算法,那么您的结果将更加准确。或者,尝试编写您自己的版本,以解决您需要的特定的零填充数字转换。

You may be surprised at the results.

你可能会对结果感到惊讶。

#5


3  

Looks like Jaywalker is suggesting a very similar method (beat me by less than an hour).

看起来Jaywalker建议了一个非常相似的方法(比我快一个小时)。

In addition to the already suggested lookup table method (n2s[] array below), how about generating your format buffer so that the usual sprintf is less intensive? The code below will only have to fill in the minute and second every time through the loop unless the year/month/day/hour have changed. Obviously, if any of those have changed you do take another sprintf hit but overall it may not be more than what you are currently witnessing (when combined with the array lookup).

除了已经建议的查找表方法(下面的n2s[]数组)之外,如何生成格式化缓冲区,这样通常的sprintf就不那么强烈了?下面的代码只需要在每次循环中填写分秒,除非年/月/日/小时发生了变化。显然,如果其中任何一个已经发生了更改,您将再次执行sprintf命中,但总的来说,它可能不会超过您当前看到的(与数组查找结合使用时)。


static char fbuf[80];
static SYSTEMTIME lastSysTime = {0, ..., 0};  // initialize to all zeros.

for (int i = 0; i < 100000; i++)
{
    if ((lastSysTime.wHour != sysTime.wHour)
    ||  (lastSysTime.wDay != sysTime.wDay)
    ||  (lastSysTime.wMonth != sysTime.wMonth)
    ||  (lastSysTime.wYear != sysTime.wYear))
    {
        sprintf(fbuf, "%4d-%02s-%02s %02s:%%02s:%%02s",
                sysTime.wYear, n2s[sysTime.wMonth],
                n2s[sysTime.wDay], n2s[sysTime.wHour]);

        lastSysTime.wHour = sysTime.wHour;
        lastSysTime.wDay = sysTime.wDay;
        lastSysTime.wMonth = sysTime.wMonth;
        lastSysTime.wYear = sysTime.wYear;
    }

    sprintf(buf, fbuf, n2s[sysTime.wMinute], n2s[sysTime.wSecond]);

}

#6


2  

How about caching the results? Isn't that a possibility? Considering that this particular sprintf() call is made too often in your code, I'm assuming that between most of these consecutive calls, the year, month and day do not change.

如何缓存结果?这难道不是一种可能吗?考虑到这个sprintf()调用在您的代码中过于频繁,我假设在大多数连续调用之间,年、月和日不会发生变化。

Thus, we can implement something like the following. Declare an old and a current SYSTEMTIME structure:

因此,我们可以实现如下内容。声明一个旧的和当前的系统时间结构:

SYSTEMTIME sysTime, oldSysTime;

Also, declare separate parts to hold the date and the time:

同时,声明单独的部分来保存日期和时间:

char datePart[80];
char timePart[80];

For, the first time, you'll have to fill in both sysTime, oldSysTime as well as datePart and timePart. But subsequent sprintf()'s can be made quite faster as given below:

第一次,您必须同时填写sysTime、oldSysTime以及datePart和timePart。但是随后的sprintf()可以做得更快,如下所示:

sprintf (timePart, "%02d:%02d:%02d", sysTime.wHour, sysTime.wMinute, sysTime.wSecond);
if (oldSysTime.wYear == sysTime.wYear && 
  oldSysTime.wMonth == sysTime.wMonth &&
  oldSysTime.wDay == sysTime.wDay) 
  {
     // we can reuse the date part
     strcpy (buff, datePart);
     strcat (buff, timePart);
  }
else {
     // we need to regenerate the date part as well
     sprintf (datePart, "%4d-%02d-%02d", sysTime.wYear, sysTime.wMonth, sysTime.wDay);
     strcpy (buff, datePart);
     strcat (buff, timePart);
}

memcpy (&oldSysTime, &sysTime, sizeof (SYSTEMTIME));

Above code has some redundancy to make the code easier to understand. You can factor out easily. You can further speed up if you know that even hour and minutes won't change faster than your call to the routine.

上面的代码有一些冗余,使代码更容易理解。你可以很容易地提出来。如果你知道,即使是一小时或几分钟的变化也不会比你对日常生活的呼唤来得快,那么你可以进一步加快速度。

#7


2  

I would do a few things...

我会做一些事情……

  • cache the current time so you don't have to regenerate the timestamp every time
  • 缓存当前时间,这样您就不必每次都重新生成时间戳。
  • do the time conversion manually. The slowest part of the printf-family functions is the format-string parsing, and it's silly to be devoting cycles to that parsing on every loop execution.
  • 手工进行时间转换。printf-family函数中最慢的部分是form -string解析,在每次循环执行中为该解析分配循环是很愚蠢的。
  • try using 2-byte lookup tables for all conversions ({ "00", "01", "02", ..., "99" }). This is because you want to avoid moduluar arithmetic, and a 2-byte table means you only have to use one modulo, for the year.
  • 尝试对所有转换使用2字节的查找表({“00”、“01”、“02”、……," 99 " })。这是因为您希望避免使用moduluar算法,而一个2字节的表意味着您一年只需要使用一个modulo。

#8


1  

You would probably get w perf increase by hand rolling a routine that lays out the digits in the return buf, since you could avoid repeatedly parsing a format string and would not have to deal with a lot of the more complex cases sprintf handles. I am loathe to actually recommend doing that though.

通过手工滚动显示返回buf中的数字的例程,您可能会得到w perf的增加,因为您可以避免重复解析格式字符串,并且不必处理许多sprintf处理的更复杂的情况。但我不愿意建议这么做。

I would recommend trying to figure out if you can somehow reduce the amount you need to generate these strings, are they optional somegtimes, can they be cached, etc.

我建议你试着弄清楚你是否能以某种方式减少生成这些字符串所需的数量,它们是否有时是可选的,它们是否可以被缓存,等等。

#9


1  

I'm working on a similar problem at the moment.

我目前正在研究一个类似的问题。

I need to log debug statements with timestamp, filename, line number etc on an embedded system. We already have a logger in place but when I turn the knob to 'full logging', it eats all our proc cycles and puts our system in dire states, states that no computing device should ever have to experience.

我需要用时间戳、文件名、行号等在嵌入式系统上记录调试语句。我们已经有了一个日志记录器,但是当我把旋钮转到“全日志”时,它会吃掉我们所有的proc循环,并将我们的系统置于可怕的状态,状态是任何计算设备都不应该经历的。

Someone did say "You cannot measure/observe something without changing that which you are measuring/observing."

有人说过:“如果不改变你正在测量/观察的东西,你就无法测量/观察。”

So I'm changing things to improve performance. The current state of things is that Im 2x faster than the original function call (the bottleneck in that logging system is not in the function call but in the log reader which is a separate executable, which I can discard if I write my own logging stack).

所以我在改变事情以提高性能。目前的情况是,Im比最初的函数调用快了2x(日志记录系统的瓶颈不在函数调用中,而是在日志读取器中,日志读取器是一个单独的可执行文件,如果我编写自己的日志堆栈,我可以丢弃它)。

The interface I need to provide is something like- void log(int channel, char *filename, int lineno, format, ...). I need to append the channel name (which currently does a linear search within a list! For every single debug statement!) and timestamp including millisecond counter. Here are some of the things Im doing to make this faster-

我需要提供的接口类似于- void log(int channel, char *filename, int lineno, format,…)。我需要添加通道名称(当前在列表中执行线性搜索!对于每个调试语句)和时间戳(包括毫秒计数器)。这里有一些我正在做的事情,使这更快

  • Stringify channel name so I can strcpy rather than search the list. define macro LOG(channel, ...etc) as log(#channel, ...etc). You can use memcpy if you fix the length of the string by defining LOG(channel, ...) log("...."#channel - sizeof("...."#channel) + *11*) to get fixed 10 byte channel lengths
  • Stringify通道名以便我可以strcpy而不是搜索列表。将宏日志(通道,等等)定义为日志(#通道,等等)。您可以使用memcpy如果你固定字符串的长度通过定义日志(频道,…)日志(“....“#通道- sizeof(“....”#通道)+ * 11 *)得到固定10字节通道长度
  • Generate timestamp string a couple of times a second. You can use asctime or something. Then memcpy the fixed length string to every debug statement.
  • 每秒生成几次时间戳字符串。你可以用asctime或者别的什么。然后将固定长度的字符串memcpy发送到每个调试语句。
  • If you want to generate the timestamp string in real time then a look up table with assignment (not memcpy!) is perfect. But that works only for 2 digit numbers and maybe for the year.
  • 如果您想要实时生成时间戳字符串,那么使用赋值查找表(不是memcpy!)是完美的。但这只适用于两位数的数字,可能也适用于一年。
  • What about three digits (milliseconds) and five digits (lineno)? I dont like itoa and I dont like the custom itoa (digit = ((value /= value) % 10)) either because divs and mods are slow. I wrote the functions below and later discovered that something similar is in the AMD optimization manual (in assembly) which gives me confidence that these are about the fastest C implementations.

    三位数(毫秒)和五位数(lineno)是多少?我不喜欢itoa,也不喜欢自定义的itoa (digit = (value /= value) % 10),因为divs和mods很慢。我编写了下面的函数,后来发现在AMD优化手册(汇编)中也有类似的内容,这让我相信这些都是最快的C实现。

    void itoa03(char *string, unsigned int value)
    {
       *string++ = '0' + ((value = value * 2684355) >> 28);
       *string++ = '0' + ((value = ((value & 0x0FFFFFFF)) * 10) >> 28);
       *string++ = '0' + ((value = ((value & 0x0FFFFFFF)) * 10) >> 28);
       *string++ = ' ';/* null terminate here if thats what you need */
    }
    

    Similarly, for the line numbers,

    同样,对于行号,

    void itoa05(char *string, unsigned int value)
    {
       *string++ = ' ';
       *string++ = '0' + ((value = value * 26844 + 12) >> 28);
       *string++ = '0' + ((value = ((value & 0x0FFFFFFF)) * 10) >> 28);
       *string++ = '0' + ((value = ((value & 0x0FFFFFFF)) * 10) >> 28);
       *string++ = '0' + ((value = ((value & 0x0FFFFFFF)) * 10) >> 28);
       *string++ = '0' + ((value = ((value & 0x0FFFFFFF)) * 10) >> 28);
       *string++ = ' ';/* null terminate here if thats what you need */
    }
    

Overall, my code is pretty fast now. The vsnprintf() I need to use takes about 91% of the time and the rest of my code takes only 9% (whereas the rest of the code i.e. except vsprintf() used to take 54% earlier)

总的来说,我的代码现在非常快。我需要使用的vsnprintf()占用了大约91%的时间,剩下的代码只占用9%的时间(而其余的代码,例如,除了vsprintf()之前占用了54%的时间)

#10


1  

The two fast formatters I've tested are FastFormat and Karma::generate (part of Boost Spirit).

我测试过的两个快速格式是FastFormat和Karma::generate (Boost Spirit的一部分)。

You might also find it useful to benchmark it or at least look for existing benchmarks.

您可能还会发现,基准测试或至少寻找现有基准是有用的。

For example this one (though it's missing FastFormat):

例如这一个(尽管它缺少FastFormat):

如何改进/替换sprintf,我已经将其度量为性能热点?

#11


0  

StringStream is the suggestion that I got from Google.

StringStream是我从谷歌得到的建议。

http://bytes.com/forum/thread132583.html

http://bytes.com/forum/thread132583.html

#12


0  

It's hard to imagine that you're going to beat sprintf at formatting integers. Are you sure sprintf is your problem?

很难想象你会在格式化整数时打败sprintf。你确定斯普林特夫是你的问题吗?

#1


19  

If you were writing your own function to do the job, a lookup table of the string values of 0 .. 61 would avoid having to do any arithmetic for everything apart from the year.

如果您正在编写自己的函数来完成这项工作,那么将创建一个字符串值为0的查找表。61可以避免在一年之外做任何计算。

edit: Note that to cope with leap seconds (and to match strftime()) you should be able to print seconds values of 60 and 61.

编辑:注意,要处理闰秒(和匹配strftime())),您应该能够打印60和61的秒值。

char LeadingZeroIntegerValues[62][] = { "00", "01", "02", ... "59", "60", "61" };

Alternatively, how about strftime()? I've no idea how the performance compares (it could well just be calling sprintf()), but it's worth looking at (and it could be doing the above lookup itself).

另外,strftime()怎么样?我不知道性能是如何比较的(它很可能只是调用sprintf())),但是它值得一看(它可能正在执行上面的查找)。

#2


6  

You could try filling each char in the output in turn.

您可以尝试依次填充输出中的每个字符。

buf[0] = (sysTime.wYear / 1000) % 10 + '0' ;
buf[1] = (sysTime.wYear / 100) % 10 + '0';
buf[2] = (sysTime.wYear / 10) % 10 + '0';
buf[3] = sysTime.wYear % 10 + '0';
buf[4] = '-';

... etc...

…等等……

Not pretty, but you get the picture. If nothing else, it may help explain why sprintf isn't going to be that fast.

不漂亮,但你懂的。如果没有别的原因,这或许可以解释为什么斯普林特夫不会那么快。

OTOH, maybe you could cache the last result. That way you'd only need to generate one every second.

OTOH,也许你可以缓存最后的结果。这样,你只需要每秒生成一个。

#3


6  

Printf needs to deal with a lot of different formats. You certainly could grab the source for printf and use it as a basis to roll your own version that deals specifically with the sysTime structure. That way you pass in one argument, and it does just exactly the work that needs to be done and nothing more.

Printf需要处理许多不同的格式。当然,您可以获取printf的源代码,并将其作为您自己的版本的基础,该版本专门处理sysTime结构。这样你就传递了一个论点,它只做了需要做的工作,仅此而已。

#4


3  

What do you mean by a "long" time -- since the sprintf() is the only statement in your loop and the "plumbing" of the loop (increment, comparison) is negligible, the sprintf() has to consume the most time.

您所说的“长”时间是什么意思?因为sprintf()是您的循环中惟一的语句,并且循环的“管道”(增量、比较)可以忽略,所以sprintf()必须消耗最多的时间。

Remember the old joke about the man who lost his wedding ring on 3rd Street one night, but looked for it on 5th because the light was brighter there? You've built an example that's designed to "prove" your assumption that sprintf() is ineffecient.

还记得那个老笑话吗?有一天晚上,一个男人在第三大街丢了结婚戒指,却在第5街去找,因为那里的灯光更明亮。您已经构建了一个旨在“证明”您的假定sprintf()是无效的示例。

Your results will be more accurate if you profile "actual" code that contains sprintf() in addition to all the other functions and algorithms you use. Alternatively, try writing your own version that addresses the specific zero-padded numeric conversion that you require.

如果您对包含sprintf()的“实际”代码进行概要分析,并且使用所有其他函数和算法,那么您的结果将更加准确。或者,尝试编写您自己的版本,以解决您需要的特定的零填充数字转换。

You may be surprised at the results.

你可能会对结果感到惊讶。

#5


3  

Looks like Jaywalker is suggesting a very similar method (beat me by less than an hour).

看起来Jaywalker建议了一个非常相似的方法(比我快一个小时)。

In addition to the already suggested lookup table method (n2s[] array below), how about generating your format buffer so that the usual sprintf is less intensive? The code below will only have to fill in the minute and second every time through the loop unless the year/month/day/hour have changed. Obviously, if any of those have changed you do take another sprintf hit but overall it may not be more than what you are currently witnessing (when combined with the array lookup).

除了已经建议的查找表方法(下面的n2s[]数组)之外,如何生成格式化缓冲区,这样通常的sprintf就不那么强烈了?下面的代码只需要在每次循环中填写分秒,除非年/月/日/小时发生了变化。显然,如果其中任何一个已经发生了更改,您将再次执行sprintf命中,但总的来说,它可能不会超过您当前看到的(与数组查找结合使用时)。


static char fbuf[80];
static SYSTEMTIME lastSysTime = {0, ..., 0};  // initialize to all zeros.

for (int i = 0; i < 100000; i++)
{
    if ((lastSysTime.wHour != sysTime.wHour)
    ||  (lastSysTime.wDay != sysTime.wDay)
    ||  (lastSysTime.wMonth != sysTime.wMonth)
    ||  (lastSysTime.wYear != sysTime.wYear))
    {
        sprintf(fbuf, "%4d-%02s-%02s %02s:%%02s:%%02s",
                sysTime.wYear, n2s[sysTime.wMonth],
                n2s[sysTime.wDay], n2s[sysTime.wHour]);

        lastSysTime.wHour = sysTime.wHour;
        lastSysTime.wDay = sysTime.wDay;
        lastSysTime.wMonth = sysTime.wMonth;
        lastSysTime.wYear = sysTime.wYear;
    }

    sprintf(buf, fbuf, n2s[sysTime.wMinute], n2s[sysTime.wSecond]);

}

#6


2  

How about caching the results? Isn't that a possibility? Considering that this particular sprintf() call is made too often in your code, I'm assuming that between most of these consecutive calls, the year, month and day do not change.

如何缓存结果?这难道不是一种可能吗?考虑到这个sprintf()调用在您的代码中过于频繁,我假设在大多数连续调用之间,年、月和日不会发生变化。

Thus, we can implement something like the following. Declare an old and a current SYSTEMTIME structure:

因此,我们可以实现如下内容。声明一个旧的和当前的系统时间结构:

SYSTEMTIME sysTime, oldSysTime;

Also, declare separate parts to hold the date and the time:

同时,声明单独的部分来保存日期和时间:

char datePart[80];
char timePart[80];

For, the first time, you'll have to fill in both sysTime, oldSysTime as well as datePart and timePart. But subsequent sprintf()'s can be made quite faster as given below:

第一次,您必须同时填写sysTime、oldSysTime以及datePart和timePart。但是随后的sprintf()可以做得更快,如下所示:

sprintf (timePart, "%02d:%02d:%02d", sysTime.wHour, sysTime.wMinute, sysTime.wSecond);
if (oldSysTime.wYear == sysTime.wYear && 
  oldSysTime.wMonth == sysTime.wMonth &&
  oldSysTime.wDay == sysTime.wDay) 
  {
     // we can reuse the date part
     strcpy (buff, datePart);
     strcat (buff, timePart);
  }
else {
     // we need to regenerate the date part as well
     sprintf (datePart, "%4d-%02d-%02d", sysTime.wYear, sysTime.wMonth, sysTime.wDay);
     strcpy (buff, datePart);
     strcat (buff, timePart);
}

memcpy (&oldSysTime, &sysTime, sizeof (SYSTEMTIME));

Above code has some redundancy to make the code easier to understand. You can factor out easily. You can further speed up if you know that even hour and minutes won't change faster than your call to the routine.

上面的代码有一些冗余,使代码更容易理解。你可以很容易地提出来。如果你知道,即使是一小时或几分钟的变化也不会比你对日常生活的呼唤来得快,那么你可以进一步加快速度。

#7


2  

I would do a few things...

我会做一些事情……

  • cache the current time so you don't have to regenerate the timestamp every time
  • 缓存当前时间,这样您就不必每次都重新生成时间戳。
  • do the time conversion manually. The slowest part of the printf-family functions is the format-string parsing, and it's silly to be devoting cycles to that parsing on every loop execution.
  • 手工进行时间转换。printf-family函数中最慢的部分是form -string解析,在每次循环执行中为该解析分配循环是很愚蠢的。
  • try using 2-byte lookup tables for all conversions ({ "00", "01", "02", ..., "99" }). This is because you want to avoid moduluar arithmetic, and a 2-byte table means you only have to use one modulo, for the year.
  • 尝试对所有转换使用2字节的查找表({“00”、“01”、“02”、……," 99 " })。这是因为您希望避免使用moduluar算法,而一个2字节的表意味着您一年只需要使用一个modulo。

#8


1  

You would probably get w perf increase by hand rolling a routine that lays out the digits in the return buf, since you could avoid repeatedly parsing a format string and would not have to deal with a lot of the more complex cases sprintf handles. I am loathe to actually recommend doing that though.

通过手工滚动显示返回buf中的数字的例程,您可能会得到w perf的增加,因为您可以避免重复解析格式字符串,并且不必处理许多sprintf处理的更复杂的情况。但我不愿意建议这么做。

I would recommend trying to figure out if you can somehow reduce the amount you need to generate these strings, are they optional somegtimes, can they be cached, etc.

我建议你试着弄清楚你是否能以某种方式减少生成这些字符串所需的数量,它们是否有时是可选的,它们是否可以被缓存,等等。

#9


1  

I'm working on a similar problem at the moment.

我目前正在研究一个类似的问题。

I need to log debug statements with timestamp, filename, line number etc on an embedded system. We already have a logger in place but when I turn the knob to 'full logging', it eats all our proc cycles and puts our system in dire states, states that no computing device should ever have to experience.

我需要用时间戳、文件名、行号等在嵌入式系统上记录调试语句。我们已经有了一个日志记录器,但是当我把旋钮转到“全日志”时,它会吃掉我们所有的proc循环,并将我们的系统置于可怕的状态,状态是任何计算设备都不应该经历的。

Someone did say "You cannot measure/observe something without changing that which you are measuring/observing."

有人说过:“如果不改变你正在测量/观察的东西,你就无法测量/观察。”

So I'm changing things to improve performance. The current state of things is that Im 2x faster than the original function call (the bottleneck in that logging system is not in the function call but in the log reader which is a separate executable, which I can discard if I write my own logging stack).

所以我在改变事情以提高性能。目前的情况是,Im比最初的函数调用快了2x(日志记录系统的瓶颈不在函数调用中,而是在日志读取器中,日志读取器是一个单独的可执行文件,如果我编写自己的日志堆栈,我可以丢弃它)。

The interface I need to provide is something like- void log(int channel, char *filename, int lineno, format, ...). I need to append the channel name (which currently does a linear search within a list! For every single debug statement!) and timestamp including millisecond counter. Here are some of the things Im doing to make this faster-

我需要提供的接口类似于- void log(int channel, char *filename, int lineno, format,…)。我需要添加通道名称(当前在列表中执行线性搜索!对于每个调试语句)和时间戳(包括毫秒计数器)。这里有一些我正在做的事情,使这更快

  • Stringify channel name so I can strcpy rather than search the list. define macro LOG(channel, ...etc) as log(#channel, ...etc). You can use memcpy if you fix the length of the string by defining LOG(channel, ...) log("...."#channel - sizeof("...."#channel) + *11*) to get fixed 10 byte channel lengths
  • Stringify通道名以便我可以strcpy而不是搜索列表。将宏日志(通道,等等)定义为日志(#通道,等等)。您可以使用memcpy如果你固定字符串的长度通过定义日志(频道,…)日志(“....“#通道- sizeof(“....”#通道)+ * 11 *)得到固定10字节通道长度
  • Generate timestamp string a couple of times a second. You can use asctime or something. Then memcpy the fixed length string to every debug statement.
  • 每秒生成几次时间戳字符串。你可以用asctime或者别的什么。然后将固定长度的字符串memcpy发送到每个调试语句。
  • If you want to generate the timestamp string in real time then a look up table with assignment (not memcpy!) is perfect. But that works only for 2 digit numbers and maybe for the year.
  • 如果您想要实时生成时间戳字符串,那么使用赋值查找表(不是memcpy!)是完美的。但这只适用于两位数的数字,可能也适用于一年。
  • What about three digits (milliseconds) and five digits (lineno)? I dont like itoa and I dont like the custom itoa (digit = ((value /= value) % 10)) either because divs and mods are slow. I wrote the functions below and later discovered that something similar is in the AMD optimization manual (in assembly) which gives me confidence that these are about the fastest C implementations.

    三位数(毫秒)和五位数(lineno)是多少?我不喜欢itoa,也不喜欢自定义的itoa (digit = (value /= value) % 10),因为divs和mods很慢。我编写了下面的函数,后来发现在AMD优化手册(汇编)中也有类似的内容,这让我相信这些都是最快的C实现。

    void itoa03(char *string, unsigned int value)
    {
       *string++ = '0' + ((value = value * 2684355) >> 28);
       *string++ = '0' + ((value = ((value & 0x0FFFFFFF)) * 10) >> 28);
       *string++ = '0' + ((value = ((value & 0x0FFFFFFF)) * 10) >> 28);
       *string++ = ' ';/* null terminate here if thats what you need */
    }
    

    Similarly, for the line numbers,

    同样,对于行号,

    void itoa05(char *string, unsigned int value)
    {
       *string++ = ' ';
       *string++ = '0' + ((value = value * 26844 + 12) >> 28);
       *string++ = '0' + ((value = ((value & 0x0FFFFFFF)) * 10) >> 28);
       *string++ = '0' + ((value = ((value & 0x0FFFFFFF)) * 10) >> 28);
       *string++ = '0' + ((value = ((value & 0x0FFFFFFF)) * 10) >> 28);
       *string++ = '0' + ((value = ((value & 0x0FFFFFFF)) * 10) >> 28);
       *string++ = ' ';/* null terminate here if thats what you need */
    }
    

Overall, my code is pretty fast now. The vsnprintf() I need to use takes about 91% of the time and the rest of my code takes only 9% (whereas the rest of the code i.e. except vsprintf() used to take 54% earlier)

总的来说,我的代码现在非常快。我需要使用的vsnprintf()占用了大约91%的时间,剩下的代码只占用9%的时间(而其余的代码,例如,除了vsprintf()之前占用了54%的时间)

#10


1  

The two fast formatters I've tested are FastFormat and Karma::generate (part of Boost Spirit).

我测试过的两个快速格式是FastFormat和Karma::generate (Boost Spirit的一部分)。

You might also find it useful to benchmark it or at least look for existing benchmarks.

您可能还会发现,基准测试或至少寻找现有基准是有用的。

For example this one (though it's missing FastFormat):

例如这一个(尽管它缺少FastFormat):

如何改进/替换sprintf,我已经将其度量为性能热点?

#11


0  

StringStream is the suggestion that I got from Google.

StringStream是我从谷歌得到的建议。

http://bytes.com/forum/thread132583.html

http://bytes.com/forum/thread132583.html

#12


0  

It's hard to imagine that you're going to beat sprintf at formatting integers. Are you sure sprintf is your problem?

很难想象你会在格式化整数时打败sprintf。你确定斯普林特夫是你的问题吗?