进行TDD时的性能测试最佳实践？

I'm working on a project which is in serious need of some performance tuning.

我正在研究一个非常需要一些性能调整的项目。

How do I write a test that fails if my optimizations do not in improve the speed of the program?

如果我的优化不能提高程序的速度,如何编写失败的测试?

To elaborate a bit:

详细说明:

The problem is not discovering which parts to optimize. I can use various profiling and benchmarking tools for that.

问题不在于发现要优化的部分。我可以使用各种分析和基准测试工具。

The problem is using automated tests to document that a specific optimization did indeed have the intended effect. It would also be highly desirable if I could use the test suite to discover possible performance regressions later on.

问题是使用自动化测试来记录特定优化确实具有预期效果。如果我可以使用测试套件以后发现可能的性能回归,那也是非常可取的。

I suppose I could just run my profiling tools to get some values and then assert that my optimized code produces better values. The obvious problem with that, however, is that benchmarking values are not hard values. They vary with the local environment.

我想我可以运行我的分析工具来获取一些值,然后断言我的优化代码会产生更好的值。然而,明显的问题是基准值不是硬值。它们因当地环境而异。

So, is the answer to always use the same machine to do this kind of integration testing? If so, you would still have to allow for some fuzziness in the results, since even on the same hardware benchmarking results can vary. How then to take this into account?

那么,总是使用同一台机器进行这种集成测试的答案是什么?如果是这样,您仍然必须在结果中允许一些模糊性,因为即使在相同的硬件上,基准测试结果也会有所不同。那怎么考虑到这一点呢?

Or maybe the answer is to keep older versions of the program and compare results before and after? This would be my preferred method, since it's mostly environment agnostic. Does anyone have experience with this approach? I imagine it would only be necessary to keep one older version if all the tests can be made to pass if the performance of the latest version is at least as good as the former version.

或者答案是保留程序的旧版本并比较前后的结果?这将是我首选的方法,因为它主要与环境无关。有没有人有这种方法的经验?我想只有在最新版本的性能至少与前一版本一样好的情况下才能通过所有测试时,才需要保留一个旧版本。

9 个解决方案

#1

I suspect that applying TDD to drive performance is a mistake. By all means, use it to get to good design and working code, and use the tests written in the course of TDD to ensure continued correctness - but once you have well-factored code and a solid suite of tests, you are in good shape to tune, and different (from TDD) techniques and tools apply.

我怀疑应用TDD来提高性能是一个错误。无论如何,使用它来获得良好的设计和工作代码,并使用在TDD过程中编写的测试来确保持续的正确性 - 但是一旦你有完善的代码和一套可靠的测试,你就会处于良好的状态适用于调整和不同(来自TDD)的技术和工具。

TDD gives you good design, reliable code, and a test coverage safety net. That puts you into a good place for tuning, but I think that because of the problems you and others have cited, it's simply not going to take you much further down the tuning road. I say that as a great fan and proponent of TDD and a practitioner.

TDD为您提供良好的设计,可靠的代码和测试覆盖安全网。这会让你进入调整的好地方,但我认为,由于你和其他人所引用的问题,它根本不会让你在调整之路上走得更远。我说这是TDD的忠实粉丝和支持者以及从业者。

#2

First you need to establish some criteria for acceptable performance, then you need to devise a test that will fail that criteria when using the existing code, then you need to tweak your code for performance until it passes the test. You will probably have more than one criteria for performance, and you should certainly have more than one test.

首先,您需要为可接受的性能建立一些标准,然后您需要设计一个在使用现有代码时将失败该标准的测试,然后您需要调整代码以获得性能,直到它通过测试。您可能会有多个性能标准,您当然应该有多个测试。

#3

In many server applications (might not be your case) performance problem manifest only under concurrent access and under load. Measuring absolute time a routine executes and trying to improve it is therefore not very helpful. There are problems with this method even in single-threaded applications. Measuring absolute routine time relies on the clock the platform is providing, and these are not always very precise; you better rely on average time a routine takes.

在许多服务器应用程序中(可能不是您的情况),性能问题仅在并发访问和负载下才会出现。因此,测量例程执行的绝对时间并尝试改进它并不是很有帮助。即使在单线程应用程序中,此方法也存在问题。测量绝对常规时间取决于平台提供的时钟,并且这些时间并不总是非常精确;你最好依赖一个例程的平均时间。

My advice is:

我的建议是:

Use profiling to identify routines that execute the most times and take most time.

使用分析来识别执行次数最多且占用时间最多的例程。

Use tool like JMeter or Grinder to elaborate representative test cases, simulate concurrent access, put your application under stress and measure (more importantly) throughput and average response time. This will give you a better idea of how your application is behaving as seen from the outside perspective.

使用JMeter或Grinder等工具来详细说明代表性测试用例,模拟并发访问,使应用程序处于压力之下并测量(更重要的)吞吐量和平均响应时间。从外部角度看,这将使您更好地了解应用程序的行为方式。

While you could use unit tests to establish some non functional aspects of your application, I think that the approach given above will give better results during optimization process. When placing time-related assertions in your unit tests you will have to choose some very approximative values: time can vary depending on the environment you are using to run your unit tests. You don't want tests to fail only because some of your colleagues are using inferior hardware.

虽然您可以使用单元测试来建立应用程序的一些非功能方面,但我认为上面给出的方法将在优化过程中提供更好的结果。在单元测试中放置与时间相关的断言时,您必须选择一些非常近似的值:时间可能会因您运行单元测试所使用的环境而异。您不希望测试失败只是因为您的某些同事使用的是劣质硬件。

Tuning is all about finding right things to tune. You already have a functioning code, so placing performance related assertions a posteriori and without establishing critical sections of code might lead you to waste a lot of time on optimizing non-essential pieces of your application.

调整就是要找到合适的东西来调整。您已经拥有一个正常运行的代码,因此将性能相关的断言放在后面而不建立关键的代码段可能会导致您浪费大量时间来优化应用程序的非必要部分。

#4

Record the running time of the current code.

记录当前代码的运行时间。

if (newCode.RunningTime >= oldCode.RunningTime) Fail

#5

Run the tests + profiling in CI server. You can also run load tests periodically.

在CI服务器中运行测试+分析。您还可以定期运行负载测试。

You are concerned about differences (as you mentioned), so its not about defining an absolute value. Have an extra step that compares the performance measures of this run with the one of the last build, and report on the differences as %. You can raise a red flag for important variations of time.

您关注的是差异(正如您所提到的),所以它不是关于定义绝对值。有一个额外的步骤,将此运行的性能度量与上一个构建的性能度量进行比较,并将差异报告为%。你可以为重要的时间变化举起一面红旗。

If you are concerned on performance, you should have clear goals you want to meet and assert them. You should measure those with tests on the full system. Even if your application logic is fast, you might have issues with the view causing you to miss the goal. You can also combine it with the differences approach, but for these you would have less tolerance to time variations.

如果您担心性能问题,那么您应该有明确的目标并且要求它们。您应该测量那些在整个系统上进行测试的人。即使您的应用程序逻辑很快,您也可能会遇到视图问题而导致您错过目标。您也可以将它与差异方法结合使用,但对于这些方法,您对时间变化的容忍度会降低。

Note that you can run the same process in your dev computer, just using only the previous runs in that computer and not a shared one between developers.

请注意,您可以在开发计算机中运行相同的过程,只使用该计算机中的先前运行,而不是开发人员之间的共享运行。

#6

For the tuning itself, you can compare the old code and new code directly. But don't keep both copies around. This sounds like a nightmare to manage. Also, you're only ever comparing one version with another version. It's possible that a change in functionality will slow down your function, and that is acceptable to the users.

对于调整本身,您可以直接比较旧代码和新代码。但是不要保留两份副本。这听起来像是一场噩梦。此外,您只是将一个版本与另一个版本进行比较。功能的改变可能会降低您的功能,这对用户来说是可以接受的。

Personally, I've never seen performance criteria of the type 'must be faster than the last version', because it is so hard to measure.

就个人而言,我从未见过“必须比上一版本更快”的类型的性能标准,因为它很难衡量。

You say 'in serious need of performance tuning'. Where? Which queries? Which functions? Who says, the business, the users? What is acceptable performance? 3 seconds? 2 seconds? 50 milliseconds?

你说'非常需要性能调整'。哪里?哪个查询?哪个功能?谁说,业务,用户?什么是可接受的表现? 3秒? 2秒? 50毫秒?

The starting point for any performance analysis is to define the pass/fail criteria. Once you have this, you CAN automate the performance tests.

任何性能分析的起点都是定义通过/失败标准。完成此操作后,您可以自动执行性能测试。

For reliability, you can use a (simple) statistical approach. For example, run the same query under the same conditions 100 times. If 95% of them return in under n seconds, that is a pass.

为了可靠性,您可以使用(简单)统计方法。例如,在相同条件下运行相同的查询100次。如果95%的人在n秒内返回,那就是通过。

Personally, I would do this at integration time, from either a standard machine, or the integration server itself. Record the values for each test somewhere (cruise control has some nice features for this sort of thing). If you do this, you can see how performance progresses over time, and with each build. You can even make a graph. Managers like graphs.

就个人而言,我会在集成时从标准机器或集成服务器本身执行此操作。记录每个测试的值(巡航控制有一些很好的功能)。如果这样做,您可以看到性能随着时间的推移以及每次构建的进展情况。你甚至可以制作图表。经理们喜欢图表。

Having a stable environment is always hard to do when doing performance testing, whether or not you're doing automated tests or not. You'll have that particular problem no matter how you develop (TDD, Waterfall, etc).

无论您是否进行自动化测试,在进行性能测试时,始终难以获得稳定的环境。无论你如何发展(TDD,瀑布等),你都会遇到这个特殊问题。

#7

Not faced this situation yet ;) however if I did, here's how I'd go about it. (I think I picked this up from Dave Astel's book)

还没有遇到这种情况;)但是,如果我这样做,这就是我如何去做。 (我想我是从Dave Astel的书中选择的)

Step#1: Come up with a spec for 'acceptable performance' so for example, this could mean 'The user needs to be able to do Y in N secs (or millisecs)'
Step#2: Now write a failing test.. Use your friendly timer class (e.g. .NET has the StopWatch class) and Assert.Less(actualTime, MySpec)
Step#3: If the test already passes, you're done. If red, you need to optimize and make it green. As soon as the test goes green, the performance is now 'acceptable'.

步骤#1:提出“可接受性能”的规范,例如,这可能意味着“用户需要能够在N秒(或毫秒)内完成Y”步骤2:现在编写一个失败的测试..使用友好的计时器类(例如.NET具有StopWatch类)和Assert.Less(actualTime,MySpec)步骤3:如果测试已经通过,那么你就完成了。如果是红色,则需要优化并使其变绿。一旦测试变为绿色,性能现在就“可接受”。

#8

kent beck and his team automated all the tests in TDD.

肯特贝克和他的团队在TDD中自动化了所有测试。

here for performance testing also we can automate the tests in TDD.

在这里进行性能测试我们也可以在TDD中自动化测试。

the criteria here in performance testing is we should test the yes or no cases

这里的性能测试标准是我们应该测试是或否的情况

if we know the specfications well n good we can automate them also in TDD

如果我们知道这些规格很好,我们也可以在TDD中自动化它们

#9

Whilst I broadly agree with Carl Manaster's answer, with modern tools it's possible to get some of the advantages that TDD offers for functional testing into performance testing.

虽然我大致同意Carl Manaster的答案,但使用现代工具可以获得TDD为性能测试提供功能测试所带来的一些优势。

With most modern performance testing frameworks (most of my experience is with Gatling, but I believe that same's true of newer versions of most performance test frameworks), it's possible to integrate automated performance tests into the continuous integration build, and configure it so that the CI build will fail if the performance requirements aren't met.

使用大多数现代性能测试框架(我的大部分经验都是使用Gatling,但我相信大多数性能测试框架的新版本也是如此),可以将自动化性能测试集成到持续集成构建中,并对其进行配置以便如果不满足性能要求,CI构建将失败。

So provided it's possible to agree beforehand what your performance requirements are (which for some applications may be driven by SLAs agreed with users or clients), this can give you rapid feedback if a change has created a performance issue, and identify areas that need performance improvements.

因此,如果可以事先同意您的性能要求(对于某些应用程序可能由与用户或客户商定的SLA驱动),这可以在变更产生性能问题时为您提供快速反馈,并确定需要性能的区域改进。

Good performance requirements are along the lines of "when there are 5000 orders per hour, 95% of user journeys should include no more than 10 seconds of waiting, and no screen transition taking more than 1 second".

良好的性能要求是“当每小时有5000个订单时,95%的用户行程应包括不超过10秒的等待时间,并且没有超过1秒的屏幕转换”。

This also relies on having deployment to a production-like test environment in your CI pipeline.

这也依赖于在CI管道中部署到类似生产的测试环境。

However, it's probably still not a good idea to use performance requirements to drive your development in the same way that you could with functional requirements. With functional requirements, you generally have some insight into whether your application will pass the test before you run it, and it's sensible to try to write code that you think will pass. With performance, trying to optimize code whose performance hasn't been measured is a dubious practice. You can use performance results to drive your application development to some extent, just not performance requirements.

但是,使用性能要求以与功能要求相同的方式驱动开发可能仍然不是一个好主意。根据功能需求,您通常可以了解应用程序在运行之前是否通过测试,并且尝试编写您认为会通过的代码是明智的。有了性能,尝试优化性能尚未测量的代码是一种可疑的做法。您可以使用性能结果在某种程度上推动应用程序开发,而不是性能要求。

#1

#2

#3