调用QueryPerformanceCounter时会发生什么?

时间:2021-10-25 15:18:27

I'm looking into the exact implications of using QueryPerformanceCounter in our system and am trying to understand it's impact on the application. I can see from running it on my 4-core single cpu machine that it takes around 230ns to run. When I run it on a 24-core 4 cpu xeon it takes around 1.4ms to run. More interestingly on my machine when running it in multiple threads they don't impact each other. But on the multi-cpu machine the threads cause some sort of interaction that causes them to block each other. I'm wondering if there is some shared resource on the bus that they all query? What exactly happens when I call QueryPerformanceCounter and what does it really measure?

我正在研究在我们的系统中使用QueryPerformanceCounter的确切含义,并试图了解它对应用程序的影响。我可以看到在我的4核单CPU机器上运行它需要大约230ns才能运行。当我在24核4 cpu xeon上运行它需要大约1.4ms才能运行。更有趣的是,在我的机器上运行多个线程时,它们不会相互影响。但是在多CPU机器上,线程会导致某种交互,导致它们相互阻塞。我想知道他们都在查询总线上是否有一些共享资源?当我调用QueryPerformanceCounter时它究竟发生了什么?它真正测量了什么?

4 个解决方案

#1


10  

Windows QueryPerformanceCounter() has logic to determine the number of processors and invoke syncronization logic if necessary. It attempts to use the TSC register but for multiprocessor systems this register is not guaranteed to be syncronized between processors (and more importantly can vary greatly due to intelligent downclocking and sleep states).

Windows QueryPerformanceCounter()具有确定处理器数量的逻辑,并在必要时调用同步逻辑。它试图使用TSC寄存器但是对于多处理器系统,不能保证该寄存器在处理器之间同步(更重要的是,由于智能的低频和睡眠状态,它可以变化很大)。

MSDN says that it doesn't matter which processor this is called on so you may be seeing extra syncronization code for such a situation cause overhead. Also remember that it can invoke a bus transfer so you may be seeing bus contention delays.

MSDN表示调用哪个处理器并不重要,因此您可能会看到针对此类情况的额外同步代码会导致开销。还要记住,它可以调用总线传输,因此您可能会看到总线争用延迟。

Try using SetThreadAffinityMask() if possible to bind it to a specific processor. Otherwise you might just have to live with the delay or you could try a different timer (for example take a look at http://en.wikipedia.org/wiki/High_Precision_Event_Timer).

如果可能,尝试使用SetThreadAffinityMask()将其绑定到特定处理器。否则你可能只需要延迟,或者你可以尝试不同的计时器(例如,看看http://en.wikipedia.org/wiki/High_Precision_Event_Timer)。

#2


4  

I know that this thread is a bit old but I would like to add more info. First, I do agree that QueryPerformanceCounter can take more time on certain machines, but I am not sure if Ron's answer is the reason for that all the time. While I was doing some research on this issue, I found a various web pages that talks about how QueryPerformanceCounter is implemented. For instance, Precision is not the same as accuracy tells me that Windows, HAL to be more specific would use different timing device to obtain the value. This means that if windows get to use slower timing device such as PIT, it will take more time to obtain the time value. Obviously, using PIT might require PCI transaction so that would be one reason.

我知道这个帖子有点旧,但我想添加更多信息。首先,我同意QueryPerformanceCounter可以在某些机器上花费更多时间,但我不确定Ron的答案是否一直是这样的原因。当我在这个问题上做一些研究时,我发现了一个讨论QueryPerformanceCounter如何实现的各种网页。例如,Precision与精度不一样,告诉我Windows,HAL要更具体,会使用不同的计时设备来获取值。这意味着如果Windows使用较慢的计时设备(如PIT),则需要更多时间来获取时间值。显然,使用PIT可能需要PCI事务,因此这是一个原因。

I also found another article: How It Works: Timer Outputs in SQL Server 2008 R2 - Invariant TSC giving similar description. In fact, this article tells how SQLServer would time the transaction in the best way.

我还发现了另一篇文章:工作原理:SQL Server 2008 R2中的计时器输出 - 不变TSC给出了类似的描述。实际上,本文讲述了SQLServer如何以最佳方式为事务计时。

Then, I found more information on VMware site because I had to deal with customers who are using VMs and I found that there are other issues with time measurement with VMs. For those who are interested, please refer to VMware paper - Timekeeping in VMware Virtual Machines In this paper, it also talks about how some versions of windows would synchronize each TSCs. Thus, it would be safe to use QueryPerformanceCounter() in certain situations and I think that we should try something like what How It Works: Timer Outputs in SQL Server 2008 R2 suggested to find what might happen when we call QueryPerformanceCounter()

然后,我在VMware网站上找到了更多信息,因为我不得不与使用虚拟机的客户打交道,我发现虚拟机的时间测量存在其他问题。对于那些感兴趣的人,请参考VMware论文 - VMware虚拟机中的计时在本文中,它还讨论了某些版本的Windows如何同步每个TSC。因此,在某些情况下使用QueryPerformanceCounter()是安全的,我认为我们应该尝试使用它的工作原理:SQL Server 2008 R2中的计时器输出建议查找调用QueryPerformanceCounter()时可能发生的情况

#3


3  

I was under the impression that on x86 QueryPerformanceCounter() just called rdtsc under the covers. I'm suprised that it has any slowdown on multi-core machines (I've never noticed it on my 4-core cpu).

我的印象是x86 QueryPerformanceCounter()只是在封面下调用了rdtsc。我很惊讶它在多核机器上有任何减速(我从来没有在我的4核CPU上注意到它)。

#4


2  

It's been a long time since I used this much, but if memory serves there isn't one implementation of this function, as the guts are provided by the various hardware manufacturers.

自从我使用了这么长时间以来已经很长时间了,但是如果内存服务器没有这个功能的一个实现,因为胆量是由各种硬件制造商提供的。

Here is a small article from MSDN: http://msdn.microsoft.com/ja-jp/library/cc399059.aspx

这是MSDN的一篇小文章:http://msdn.microsoft.com/ja-jp/library/cc399059.aspx

Also, if you're querying performance across multiple CPUs (as opposed to multiple cores on one CPU), it's going to have to communicate across the bus, which is both slower and could be where you are seeing some blocking.

此外,如果您要查询多个CPU的性能(而不是一个CPU上的多个内核),那么它将不得不通过总线进行通信,这既慢又可能是您看到阻塞的地方。

However, like I said before it's been quite a while.

但是,就像我之前说的那样,已经有一段时间了。

Mike

#1


10  

Windows QueryPerformanceCounter() has logic to determine the number of processors and invoke syncronization logic if necessary. It attempts to use the TSC register but for multiprocessor systems this register is not guaranteed to be syncronized between processors (and more importantly can vary greatly due to intelligent downclocking and sleep states).

Windows QueryPerformanceCounter()具有确定处理器数量的逻辑,并在必要时调用同步逻辑。它试图使用TSC寄存器但是对于多处理器系统,不能保证该寄存器在处理器之间同步(更重要的是,由于智能的低频和睡眠状态,它可以变化很大)。

MSDN says that it doesn't matter which processor this is called on so you may be seeing extra syncronization code for such a situation cause overhead. Also remember that it can invoke a bus transfer so you may be seeing bus contention delays.

MSDN表示调用哪个处理器并不重要,因此您可能会看到针对此类情况的额外同步代码会导致开销。还要记住,它可以调用总线传输,因此您可能会看到总线争用延迟。

Try using SetThreadAffinityMask() if possible to bind it to a specific processor. Otherwise you might just have to live with the delay or you could try a different timer (for example take a look at http://en.wikipedia.org/wiki/High_Precision_Event_Timer).

如果可能,尝试使用SetThreadAffinityMask()将其绑定到特定处理器。否则你可能只需要延迟,或者你可以尝试不同的计时器(例如,看看http://en.wikipedia.org/wiki/High_Precision_Event_Timer)。

#2


4  

I know that this thread is a bit old but I would like to add more info. First, I do agree that QueryPerformanceCounter can take more time on certain machines, but I am not sure if Ron's answer is the reason for that all the time. While I was doing some research on this issue, I found a various web pages that talks about how QueryPerformanceCounter is implemented. For instance, Precision is not the same as accuracy tells me that Windows, HAL to be more specific would use different timing device to obtain the value. This means that if windows get to use slower timing device such as PIT, it will take more time to obtain the time value. Obviously, using PIT might require PCI transaction so that would be one reason.

我知道这个帖子有点旧,但我想添加更多信息。首先,我同意QueryPerformanceCounter可以在某些机器上花费更多时间,但我不确定Ron的答案是否一直是这样的原因。当我在这个问题上做一些研究时,我发现了一个讨论QueryPerformanceCounter如何实现的各种网页。例如,Precision与精度不一样,告诉我Windows,HAL要更具体,会使用不同的计时设备来获取值。这意味着如果Windows使用较慢的计时设备(如PIT),则需要更多时间来获取时间值。显然,使用PIT可能需要PCI事务,因此这是一个原因。

I also found another article: How It Works: Timer Outputs in SQL Server 2008 R2 - Invariant TSC giving similar description. In fact, this article tells how SQLServer would time the transaction in the best way.

我还发现了另一篇文章:工作原理:SQL Server 2008 R2中的计时器输出 - 不变TSC给出了类似的描述。实际上,本文讲述了SQLServer如何以最佳方式为事务计时。

Then, I found more information on VMware site because I had to deal with customers who are using VMs and I found that there are other issues with time measurement with VMs. For those who are interested, please refer to VMware paper - Timekeeping in VMware Virtual Machines In this paper, it also talks about how some versions of windows would synchronize each TSCs. Thus, it would be safe to use QueryPerformanceCounter() in certain situations and I think that we should try something like what How It Works: Timer Outputs in SQL Server 2008 R2 suggested to find what might happen when we call QueryPerformanceCounter()

然后,我在VMware网站上找到了更多信息,因为我不得不与使用虚拟机的客户打交道,我发现虚拟机的时间测量存在其他问题。对于那些感兴趣的人,请参考VMware论文 - VMware虚拟机中的计时在本文中,它还讨论了某些版本的Windows如何同步每个TSC。因此,在某些情况下使用QueryPerformanceCounter()是安全的,我认为我们应该尝试使用它的工作原理:SQL Server 2008 R2中的计时器输出建议查找调用QueryPerformanceCounter()时可能发生的情况

#3


3  

I was under the impression that on x86 QueryPerformanceCounter() just called rdtsc under the covers. I'm suprised that it has any slowdown on multi-core machines (I've never noticed it on my 4-core cpu).

我的印象是x86 QueryPerformanceCounter()只是在封面下调用了rdtsc。我很惊讶它在多核机器上有任何减速(我从来没有在我的4核CPU上注意到它)。

#4


2  

It's been a long time since I used this much, but if memory serves there isn't one implementation of this function, as the guts are provided by the various hardware manufacturers.

自从我使用了这么长时间以来已经很长时间了,但是如果内存服务器没有这个功能的一个实现,因为胆量是由各种硬件制造商提供的。

Here is a small article from MSDN: http://msdn.microsoft.com/ja-jp/library/cc399059.aspx

这是MSDN的一篇小文章:http://msdn.microsoft.com/ja-jp/library/cc399059.aspx

Also, if you're querying performance across multiple CPUs (as opposed to multiple cores on one CPU), it's going to have to communicate across the bus, which is both slower and could be where you are seeing some blocking.

此外,如果您要查询多个CPU的性能(而不是一个CPU上的多个内核),那么它将不得不通过总线进行通信,这既慢又可能是您看到阻塞的地方。

However, like I said before it's been quite a while.

但是,就像我之前说的那样,已经有一段时间了。

Mike