你见过的最荒谬的悲观是什么？

We all know that premature optimization is the root of all evil because it leads to unreadable/unmaintainable code. Even worse is pessimization, when someone implements an "optimization" because they think it will be faster, but it ends up being slower, as well as being buggy, unmaintainable, etc. What is the most ridiculous example of this that you've seen?

我们都知道,过早优化是所有邪恶的根源,因为它会导致不可读/不可维护的代码。更糟糕的是悲观化,当有人实施“优化”,因为他们认为它会更快,但它最终会变慢,并且变得越来越慢,不可维护等等。你看到的最荒谬的例子是什么? ?

42 个解决方案

#1

On an old project we inherited some (otherwise excellent) embedded systems programmers who had massive Z-8000 experience.

在一个旧项目中,我们继承了一些具有大量Z-8000经验的(优秀的)嵌入式系统程序员。

Our new environment was 32-bit Sparc Solaris.

我们的新环境是32位Sparc Solaris。

One of the guys went and changed all ints to shorts to speed up our code, since grabbing 16 bits from RAM was quicker than grabbing 32 bits.

其中一个人去改变所有的短信来加速我们的代码,因为从RAM中获取16位比抓取32位更快。

I had to write a demo program to show that grabbing 32-bit values on a 32-bit system was faster than grabbing 16-bit values, and explain that to grab a 16-bit value the CPU had to make a 32-bit wide memory access and then mask out or shift the bits not needed for the 16-bit value.

我不得不编写一个演示程序,以显示在32位系统上获取32位值比获取16位值更快,并解释为了获取16位值,CPU必须使32位宽存储器访问然后屏蔽掉或移位16位值不需要的位。

#2

202

I think the phrase "premature optimization is the root of all evil" is way, way over used. For many projects, it has become an excuse not to take performance into account until late in a project.

我认为“过早优化是所有邪恶的根源”这句话是方式,过度使用。对于许多项目而言,它已成为在项目后期才考虑绩效的借口。

This phrase is often a crutch for people to avoid work. I see this phrase used when people should really say "Gee, we really didn't think of that up front and don't have time to deal with it now".

这句话通常是人们避免工作的拐点。当人们真正说“哎呀,我们真的没有想到这一点,现在没时间处理它”时,我看到了这句话。

I've seen many more "ridiculous" examples of dumb performance problems than examples of problems introduced due to "pessimization"

我看到了更多“荒谬”的愚蠢性能问题的例子,而不是由于“悲观化”引入的问题的例子

Reading the same registry key thousands (or 10's of thousands) of times during program launch.

在程序启动期间读取相同的注册表项数千(或十万)次。

Loading the same DLL hundreds or thousands of times

加载相同的DLL数百或数千次

Wasting mega bytes of memory by keeping full paths to files needlessly

通过不必要地保持文件的完整路径来浪费大量字节的内存

Not organizing data structures so they take up way more memory than they need

不组织数据结构,因此它们占用的内存超出了他们的需要

Sizing all strings that store file names or paths to MAX_PATH

调整存储文件名或路径到MAX_PATH的所有字符串

Gratuitous polling for thing that have events, callbacks or other notification mechanisms

对有事件,回调或其他通知机制的事件进行无偿轮询

What I think is a better statement is this: "optimization without measuring and understanding isn't optimization at all - its just random change".

我认为更好的说法是:“没有测量和理解的优化根本不是优化 - 它只是随机变化”。

Good Performance work is time consuming - often more so that the development of the feature or component itself.

良好的性能工作非常耗时 - 通常更多的是功能或组件本身的开发。

#3

113

Databases are pessimization playland.

数据库是悲观化的游戏。

Favorites include:

Split a table into multiples (by date range, alphabetic range, etc.) because it's "too big".

将表拆分为多个(按日期范围,字母范围等),因为它“太大”。

Create an archive table for retired records, but continue to UNION it with the production table.

为退役记录创建归档表,但继续使用生产表对其进行UNION。

Duplicate entire databases by (division/customer/product/etc.)

通过(部门/客户/产品/等)复制整个数据库

Resist adding columns to an index because it makes it too big.

阻止向索引添加列,因为它太大了。

Create lots of summary tables because recalculating from raw data is too slow.

创建大量汇总表,因为从原始数据重新计算太慢。

Create columns with subfields to save space.

使用子字段创建列以节省空间。

Denormalize into fields-as-an-array.

非规范化为字段作为数组。

That's off the top of my head.

这是我的头脑。

#4

I think there is no absolute rule: some things are best optimized upfront, and some are not.

我认为没有绝对的规则:有些东西最好在前期优化,有些则不是。

For example, I worked in a company where we received data packets from satellites. Each packet cost a lot of money, so all the data was highly optimized (ie. packed). For example, latitude/longitude was not sent as absolute values (floats), but as offsets relative to the "north-west" corner of a "current" zone. We had to unpack all the data before it could be used. But I think this is not pessimization, it is intelligent optimization to reduce communication costs.

例如,我在一家公司工作,我们收到了来自卫星的数据包。每个数据包都需要花费很多钱,因此所有数据都经过了高度优化(即打包)。例如,纬度/经度不是作为绝对值(浮动)发送的,而是作为相对于“当前”区域的“西北”角的偏移。在使用之前我们必须解压缩所有数据。但我认为这不是悲观,而是智能优化以降低通信成本。

On the other hand, our software architects decided that the unpacked data should be formatted into a very readable XML document, and stored in our database as such (as opposed to having each field stored in a corresponding column). Their idea was that "XML is the future", "disk space is cheap", and "processor is cheap", so there was no need to optimize anything. The result was that our 16-bytes packets were turned into 2kB documents stored in one column, and for even simple queries we had to load megabytes of XML documents in memory! We received over 50 packets per second, so you can imagine how horrible the performance became (BTW, the company went bankrupt).

另一方面,我们的软件架构师决定将解压缩的数据格式化为非常易读的XML文档,并将其存储在我们的数据库中(而不是将每个字段存储在相应的列中)。他们的想法是“XML是未来”,“磁盘空间便宜”,“处理器便宜”,所以没有必要优化任何东西。结果是我们的16字节数据包变成了存储在一列中的2kB文档,即使是简单的查询,我们也必须在内存中加载兆字节的XML文档!我们每秒收到超过50个数据包,所以你可以想象这个表现有多可怕(顺便说一句,公司破产了)。

So again, there is no absolute rule. Yes, sometimes optimization too early is a mistake. But sometimes the "cpu/disk space/memory is cheap" motto is the real root of all evil.

再说一次,没有绝对的规则。是的,有时过早优化是一个错误。但有时候“cpu /磁盘空间/内存便宜”的座右铭是所有邪恶的真正根源。

#5

Oh good Lord, I think I have seen them all. More often than not it is an effort to fix performance problems by someone that is too darn lazy to troubleshoot their way down to the CAUSE of those performance problems or even researching whether there actually IS a performance problem. In many of these cases I wonder if it isn't just a case of that person wanting to try a particular technology and desperately looking for a nail that fits their shiny new hammer.

哦,主啊,我想我已经看到了所有人。通常情况下,这是一个努力解决性能问题的人,他们太懒,无法解决这些性能问题的原因,甚至研究是否确实存在性能问题。在许多这种情况下,我想知道这不仅仅是那个想要尝试特定技术并且拼命寻找适合他们闪亮的新锤子的钉子的情况。

Here's a recent example:

这是最近的一个例子:

Data architect comes to me with an elaborate proposal to vertically partition a key table in a fairly large and complex application. He wants to know what type of development effort would be necessary to adjust for the change. The conversation went like this:

数据架构师向我提出了一个精心设计的建议,即在一个相当大而复杂的应用程序中对关键表进行垂直分区。他想知道为适应变化需要哪种类型的开发工作。谈话是这样的:

Me: Why are you considering this? What is the problem you are trying to solve?

我:你为什么这么想?你想解决的问题是什么?

Him: Table X is too wide, we are partitioning it for performance reasons.

他:表X太宽了,我们出于性能原因对它进行分区。

Me: What makes you think it is too wide?

我:是什么让你觉得它太宽了?

Him: The consultant said that is way too many columns to have in one table.

他:顾问说,在一张桌子上有太多的栏目。

Me: And this is affecting performance?

我:这会影响性能吗?

Him: Yes, users have reported intermittent slowdowns in the XYZ module of the application.

他:是的,用户已经报告了应用程序XYZ模块的间歇性减速。

Me: How do you know the width of the table is the source of the problem?

我:你怎么知道桌子的宽度是问题的根源?

Him: That is the key table used by the XYZ module, and it is like 200 columns. It must be the problem.

他:这是XYZ模块使用的关键表,它就像200列。一定是问题所在。

Me (Explaining): But module XYZ in particular uses most of the columns in that table, and the columns it uses are unpredictable because the user configures the app to show the data they want to display from that table. It is likely that 95% of the time we'd wind up joining all the tables back together anyway which would hurt performance.

我(解释):但是模块XYZ特别使用该表中的大多数列,并且它使用的列是不可预测的,因为用户配置应用程序以显示他们想要从该表显示的数据。很可能95%的时间我们最终将所有桌子连接在一起,这会损害性能。

Him: The consultant said it is too wide and we need to change it.

他:顾问说它太宽了我们需要改变它。

Me: Who is this consultant? I didn't know we hired a consultant, nor did they talk to the development team at all.

我:这位顾问是谁?我不知道我们聘请了一名顾问,他们也没有和开发团队交谈过。

Him: Well, we haven't hired them yet. This is part of a proposal they offered, but they insisted we needed to re-architect this database.

他:嗯,我们还没有雇用他们。这是他们提供的提案的一部分,但他们坚持认为我们需要重新构建这个数据库。

Me: Uh huh. So the consultant who sells database re-design services thinks we need a database re-design....

我:嗯嗯。因此,销售数据库重新设计服务的顾问认为我们需要重新设计数据库....

The conversation went on and on like this. Afterward, I took another look at the table in question and determined that it probably could be narrowed with some simple normalization with no need for exotic partitioning strategies. This, of course turned out to be a moot point once I investigated the performance problems (previously unreported) and tracked them down to two factors:

谈话一直在继续。之后,我又看了一下这个表,并确定它可能会通过一些简单的规范化来缩小,而不需要异乎寻常的分区策略。当我调查性能问题(之前未报告过)并将其跟踪到两个因素时,这当然是一个没有实际意义的点:

Missing indexes on a few key columns.

缺少几个关键列的索引。

A few rogue data analysts who were periodically locking key tables (including the "too-wide" one) by querying the production database directly with MSAccess.

一些流氓数据分析师通过直接使用MSAccess查询生产数据库来定期锁定密钥表(包括“太宽”的表)。

Of course the architect is still pushing for a vertical partitioning of the table hanging on to the "too wide" meta-problem. He even bolstered his case by getting a proposal from another database consultant who was able to determine we needed major design changes to the database without looking at the app or running any performance analysis.

当然,架构师仍然在推动桌子的垂直分区,这种分区悬挂在“太宽”的元问题上。他甚至通过从另一位数据库顾问处获得提案来支持他的案例,该顾问能够确定我们需要对数据库进行重大设计更改,而无需查看应用程序或运行任何性能分析。

#6

I have seen people using alphadrive-7 to totally incubate CHX-LT. This is an uncommon practice. The more common practice is to initialize the ZT transformer so that bufferication is reduced (due to greater net overload resistance) and create java style bytegraphications.

我见过人们使用alphadrive-7来完全孵化CHX-LT。这是一种不常见的做法。更常见的做法是初始化ZT变换器,以减少缓冲(由于更大的净过载阻力)并创建java样式字节图。

Totally pessimistic!

#7

Nothing Earth-shattering, I admit, but I've caught people using StringBuffer to concatenate Strings outside of a loop in Java. It was something simple like turning

我承认,没有什么是惊天动地的,但是我发现人们使用StringBuffer来连接Java中循环之外的字符串。这很像转身

String msg = "Count = " + count + " of " + total + ".";

into

StringBuffer sb = new StringBuffer("Count = ");
sb.append(count);
sb.append(" of ");
sb.append(total);
sb.append(".");
String msg = sb.toString();

It used to be quite common practice to use the technique in a loop, because it was measurably faster. The thing is, StringBuffer is synchronized, so there's actually extra overhead if you're only concatenating a few Strings. (Not to mention that the difference is absolutely trivial on this scale.) Two other points about this practice:

过去常常在循环中使用该技术,因为它的速度要快得多。问题是,StringBuffer是同步的,所以如果你只连接几个字符串,实际上会有额外的开销。 (更不用说这种差异在这个范围上绝对是微不足道了。)关于这种做法的另外两点:

StringBuilder is unsynchronized, so should be preferred over StringBuffer in cases where your code can't be called from multiple threads.

StringBuilder是不同步的,因此在无法从多个线程调用代码的情况下,首选StringBuffer。

Modern Java compilers will turn readable String concatenation into optimized bytecode for you when it's appropriate anyway.

无论如何,现代Java编译器会将可读的字符串连接转换为优化的字节码。

#8

I once saw a MSSQL database that used a 'Root' table. The Root table had four columns: GUID (uniqueidentifier), ID (int), LastModDate (datetime), and CreateDate (datetime). All tables in the database were Foreign Key'd to the Root table. Whenever a new row was created in any table in the db, you had to use a couple of stored procedures to insert an entry in the Root table before you could get to the actual table you cared about (rather than the database doing the job for you with a few triggers simple triggers).

我曾经看过一个使用'Root'表的MSSQL数据库。 Root表有四列:GUID(uniqueidentifier),ID(int),LastModDate(datetime)和CreateDate(datetime)。数据库中的所有表都是Root表的外键。每当在db中的任何表中创建新行时,您必须使用几个存储过程在Root表中插入一个条目,然后才能到达您关心的实际表(而不是数据库执行的工作你有几个触发器简单的触发器)。

This created a mess of useless overheard and headaches, required anything written on top of it to use sprocs (and eliminating my hopes of introducing LINQ to the company. It was possible but just not worth the headache), and to top it off didn't even accomplish what it was supposed to do.

这造成了一堆无用的无意中听到和头痛,需要在它上面写任何东西来使用sprocs(并且消除了我将LINQ引入公司的希望。它可能但不值得头疼),并且最重要的是没有'甚至完成它应该做的事情。

The developer that chose this path defended it under the assumption that this saved tons of space because we weren't using Guids on the tables themselves (but...isn't a GUID generated in the Root table for every row we make?), improved performance somehow, and made it "easy" to audit changes to the database.

选择此路径的开发人员在假设这节省了大量空间的情况下为其辩护,因为我们没有在表本身上使用Guids(但是...不是我们制作的每一行在Root表中生成的GUID?) ,以某种方式改进了性能,并使审计对数据库的更改变得“容易”。

Oh, and the database diagram looked like a mutant spider from hell.

哦,数据库图看起来像是来自地狱的突变蜘蛛。

#9

How about POBI -- pessimization obviously by intent?

POBI怎么样 - 意图明显的悲观化?

Collegue of mine in the 90s was tired of getting kicked in the ass by the CEO just because the CEO spent the first day of every ERP software (a custom one) release with locating performance issues in the new functionalities. Even if the new functionalities crunched gigabytes and made the impossible possible, he always found some detail, or even seemingly major issue, to whine upon. He believed to know a lot about programming and got his kicks by kicking programmer asses.

90年代我的同事厌倦了首席执行官的屁股,因为首席执行官花费了每个ERP软件(一个定制的)发布的第一天,在新功能中定位性能问题。即使新的功能嘎吱作响并使不可能变为可能,他总是会发现一些细节,甚至是看似重大的问题。他相信对编程有很多了解,并且通过踢程序员驴来解决问题。

Due to the incompetent nature of the criticism (he was a CEO, not an IT guy), my collegue never managed to get it right. If you do not have a performance problem, you cannot eliminate it...

由于批评的无能(他是首席执行官,而不是IT人),我的同事从来没有设法做到正确。如果你没有性能问题,你就无法消除它......

Until for one release, he put a lot of Delay (200) function calls (it was Delphi) into the new code. It took just 20 minutes after go-live, and he was ordered to appear in the CEO's office to fetch his overdue insults in person.

直到一个版本,他把很多Delay(200)函数调用(它是Delphi)放入新代码中。上线后仅用了20分钟,他被命令出现在首席执行官办公室,以便亲自接受他过期的侮辱。

Only unusual thing so far was my collegues mute when he returned, smiling, joking, going out for a BigMac or two while he normally would kick tables, flame about the CEO and the company, and spend the rest of the day turned down to death.

到目前为止,只有不寻常的事情是我的同事们回来时微笑,开玩笑,出去买一两个BigMac,而他通常会踢桌子,对CEO和公司大加抨击,并且剩下的时间都会变成死亡。

Naturally, my collegue now rested for one or two days at his desk, improving his aiming skills in Quake -- then on the second or third day he deleted the Delay calls, rebuilt and released an "emergency patch" of which he spread the word that he had spent 2 days and 1 night to fix the performance holes.

当然,我的同事现在在他的办公桌休息了一两天,提高了他在Quake中的瞄准技能 - 然后在第二天或第三天他删除了延迟电话,重建并发布了一个“紧急补丁”,他传播了这个词他花了2天1夜来修复表演漏洞。

This was the first (and only) time that evil CEO said "great job!" to him. That's all that counts, right?

这是邪恶的首席执行官说“干得好”的第一个(也是唯一的)时间。给他。这一切都很重要,对吗?

This was real POBI.

这是真正的POBI。

But it also is a kind of social process optimization, so it's 100% ok.

但它也是一种社交流程优化,所以100%可以。

I think.

#10

"Database Independence". This meant no stored procs, triggers, etc - not even any foreign keys.

“数据库独立”。这意味着没有存储过程,触发器等 - 甚至没有任何外键。

#11

var stringBuilder = new StringBuilder();
stringBuilder.Append(myObj.a + myObj.b + myObj.c + myObj.d);
string cat = stringBuilder.ToString();

Best use of a StringBuilder I've ever seen.

最好的使用我见过的StringBuilder。

#12

Using a regex to split a string when a simple string.split suffices

当一个简单的string.split足够时,使用正则表达式来分割字符串

#13

No one seems to have mentioned sorting, so I will.

似乎没有人提到排序,所以我会。

Several different times, I've discovered that someone had hand-crafted a bubblesort, because the situation "didn't require" a call to the "too fancy" quicksort algorithm that already existed. The developer was satisified when their handcrafted bubblesort worked well enough on the ten rows of data that they're using for testing. It didn't go over quite as well after the customer had added a couple of thousand rows.

几个不同的时间,我发现有人手工制作了一个bubort,因为情况“不需要”调用已经存在的“过于花哨”的快速排序算法。当他们的手工制作的Bubbleort在他们用于测试的十行数据上运行良好时,开发人员感到满意。在客户添加了几千行之后,它并没有完全消失。

#14

Very late to this thread I know, but I saw this recently:

我知道这个帖子已经很晚了,但我最近看到了这个:

bool isFinished = GetIsFinished();

switch (isFinished)
{
    case true:
        DoFinish();
        break;

    case false:
        DoNextStep();
        break;

    default:
        DoNextStep();
}

Y'know, just in case a boolean had some extra values...

你知道吗,以防布尔值有一些额外的值......

#15

Worst example I can think of is an internal database at my company containing information on all employees. It gets a nightly update from HR and has an ASP.NET web service on top. Many other apps use the web service to populate things like search/dropdown fields.

最糟糕的例子我能想到的是我公司的一个内部数据库,其中包含所有员工的信息。它从HR获得每晚更新,并在顶部提供ASP.NET Web服务。许多其他应用程序使用Web服务来填充搜索/下拉字段等内容。

The pessimism is that the developer thought that repeated calls to the web service would be too slow to make repeated SQL queries. So what did he do? The application start event reads in the entire database and converts it all to objects in memory, stored indefinitely until the app pool is recycled. This code was so slow, it would take 15 minutes to load in less than 2000 employees. If you inadvertently recycled the app pool during the day, it could take 30 minutes or more, because each web service request would start multiple concurrent reloads. For this reason, new hires wouldn't appear in the database the first day when their account was created and therefore would not be able to access most internal apps on their first couple days, twiddling their thumbs.

悲观的是,开发人员认为重复调用Web服务的速度太慢,无法进行重复的SQL查询。他做了什么?应用程序启动事件读入整个数据库并将其全部转换为内存中的对象,无限期地存储,直到应用程序池被回收。这段代码非常慢,在不到2000名员工中加载需要15分钟。如果您在白天无意中回收了应用程序池,则可能需要30分钟或更长时间,因为每个Web服务请求都会启动多个并发重新加载。出于这个原因,新员工在创建帐户的第一天就不会出现在数据库中,因此在他们的头几天就无法访问大多数内部应用程序,只是大拇指。

The second level of pessimism is that the development manager doesn't want to touch it for fear of breaking dependent applications, but yet we continue to have sporadic company-wide outages of critical applications due to poor design of such a simple component.

第二个悲观主义是开发经理不想触及它,因为担心会破坏依赖的应用程序,但是由于这种简单组件的设计很差,我们仍然会在全公司范围内暂时停止关键应用程序。

#16

I once worked on an app that was full of code like this:

我曾经在一个充满代码的应用程序上工作过:

 1 tuple *FindTuple( DataSet *set, int target ) {
 2     tuple *found = null;
 3     tuple *curr = GetFirstTupleOfSet(set);
 4     while (curr) {
 5         if (curr->id == target)
 6             found = curr;
 7         curr = GetNextTuple(curr);
 8     }
 9     return found;
10 }

Simply removing found, returning null at the end, and changing the sixth line to:

只需删除found,最后返回null,并将第六行更改为:

            return curr;

Doubled the app performance.

应用程序性能翻倍。

#17

I once had to attempt to modify code that included these gems in the Constants class

我曾经不得不尝试在Constants类中修改包含这些gem的代码

public static String COMMA_DELIMINATOR=",";
public static String COMMA_SPACE_DELIMINATOR=", ";
public static String COLIN_DELIMINATOR=":";

Each of these were used multiple times in the rest of the application for different purposes. COMMA_DELIMINATOR littered the code with over 200 uses in 8 different packages.

为了不同的目的,这些中的每一个在应用的其余部分中被多次使用。 COMMA_DELIMINATOR在8个不同的软件包中占用了200多个代码。

#18

The big all time number one which I run into time and time again in inhouse software:

我在内部软件中一次又一次地遇到的最重要的一次:

Not using the features of the DBMS for "portability" reasons because "we might want to switch to another vendor later".

由于“我们可能希望稍后切换到其他供应商”,因此不会因为“可移植性”原因而使用DBMS的功能。

Read my lips. For any inhouse work: IT WILL NOT HAPPEN!

读我的唇语。对于任何内部工作:它不会发生!

#19

I had a co-worker who was trying to outwit our C compiler's optimizer and routine rewrote code that only he could read. One of his favorite tricks was changing a readable method like (making up some code):

我有一个同事试图战胜我们的C编译器的优化器和例行程序重写代码,只有他才能阅读。他最喜欢的一个技巧是改变一种可读的方法,比如(编写一些代码):

int some_method(int input1, int input2) {
    int x;
    if (input1 == -1) {
        return 0;
    }
    if (input1 == input2) {
        return input1;
    }
    ... a long expression here ...
    return x;
}

into this:

int some_method() {
    return (input == -1) ? 0 : (input1 == input2) ? input 1 :
           ... a long expression ...
           ... a long expression ...
           ... a long expression ...
}

That is, the first line of a once-readable method would become "return" and all other logic would be replace by deeply nested terniary expressions. When you tried to argue about how this was unmaintainable, he would point to the fact that the assembly output of his method was three or four assembly instructions shorter. It wasn't necessarily any faster but it was always a tiny bit shorter. This was an embedded system where memory usage occasionally did matter, but there were far easier optimizations that could have been made than this that would have left the code readable.

也就是说,一次可读方法的第一行将变为“返回”,所有其他逻辑将被深层嵌套的三元表达式替换。当你试图争论这是如何不可维护时,他会指出他的方法的汇编输出是三或四个汇编指令更短的事实。它不一定更快,但它总是稍微短一些。这是一个嵌入式系统,其中内存使用偶尔会起作用,但是可以做出比这更容易的优化,这将使代码可读。

Then, after this, for some reason he decided that ptr->structElement was too unreadable, so he started changing all of these into (*ptr).structElement on the theory that it was more readable and faster as well.

然后,在此之后,由于某种原因,他认为ptr-> structElement太难以理解,所以他开始将所有这些改为(* ptr).structElement,理论上它更具可读性和更快性。

Turning readable code into unreadable code for at the most a 1% improvement, and sometimes actually slower code.

将可读代码转换为不可读代码,最多可提高1%,有时实际上代码更慢。

#20

In one of my first jobs as a full-fledged developer, I took over a project for a program that was suffering scaling issues. It would work reasonably well on small data sets, but would completely crash when given large quantities of data.

在我作为一名成熟的开发人员的第一份工作中,我接手了一个项目,该项目正在遭遇扩展问题。它在小数据集上运行得相当好,但在给定大量数据时会完全崩溃。

As I dug in, I found that the original programmer sought to speed things up by parallelizing the analysis - launching a new thread for each additional data source. However, he'd made a mistake in that all threads required a shared resource, on which they were deadlocking. Of course, all benefits of concurrency disappeared. Moreover it crashed most systems to launch 100+ threads only to have all but one of them lock. My beefy dev machine was an exception in that it churned through a 150-source dataset in around 6 hours.

当我进入时,我发现原始程序员试图通过并行化分析来加快速度 - 为每个额外的数据源启动一个新线程。但是,他犯了一个错误,因为所有线程都需要一个共享资源,而这些资源就是死锁。当然,并发的所有好处都消失了。此外,它破坏了大多数系统以启动100多个线程,只有其中一个线程锁定。我的强劲开发机器是一个例外,因为它在大约6小时内通过150源数据集进行搅拌。

So to fix it, I removed the multi-threading components and cleaned up the I/O. With no other changes, execution time on the 150-source dataset dropped below 10 minutes on my machine, and from infinity to under half an hour on the average company machine.

因此,为了解决这个问题,我删除了多线程组件并清理了I / O.没有其他更改,150源数据集上的执行时间在我的机器上降至10分钟以下,从普通公司机器上的无限小时降至半小时以下。

#21

I suppose I could offer this gem:

我想我可以提供这个宝石:

unsigned long isqrt(unsigned long value)
{
    unsigned long tmp = 1, root = 0;
    #define ISQRT_INNER(shift) \
    { \
        if (value >= (tmp = ((root << 1) + (1 << (shift))) << (shift))) \
        { \
            root += 1 << shift; \
            value -= tmp; \
        } \
    }

    // Find out how many bytes our value uses
    // so we don't do any uneeded work.
    if (value & 0xffff0000)
    {
        if ((value & 0xff000000) == 0)
            tmp = 3;
        else
            tmp = 4;
    }
    else if (value & 0x0000ff00)
        tmp = 2;

    switch (tmp)
    {
        case 4:
            ISQRT_INNER(15);
            ISQRT_INNER(14);
            ISQRT_INNER(13);
            ISQRT_INNER(12);
        case 3:
            ISQRT_INNER(11);
            ISQRT_INNER(10);
            ISQRT_INNER( 9);
            ISQRT_INNER( 8);
        case 2:
            ISQRT_INNER( 7);
            ISQRT_INNER( 6);
            ISQRT_INNER( 5);
            ISQRT_INNER( 4);
        case 1:
            ISQRT_INNER( 3);
            ISQRT_INNER( 2);
            ISQRT_INNER( 1);
            ISQRT_INNER( 0);
    }
#undef ISQRT_INNER
    return root;
}

Since the square-root was calculated at a very sensitive place, I got the task of looking into a way to make it faster. This small refactoring reduced the execution time by a third (for the combination of hardware and compiler used, YMMV):

由于平方根在一个非常敏感的地方计算,我的任务是寻找一种方法来使它更快。这种小型重构将执行时间减少了三分之一(对于所使用的硬件和编译器的组合,YMMV):

unsigned long isqrt(unsigned long value)
{
    unsigned long tmp = 1, root = 0;
    #define ISQRT_INNER(shift) \
    { \
        if (value >= (tmp = ((root << 1) + (1 << (shift))) << (shift))) \
        { \
            root += 1 << shift; \
            value -= tmp; \
        } \
    }

    ISQRT_INNER (15);
    ISQRT_INNER (14);
    ISQRT_INNER (13);
    ISQRT_INNER (12);
    ISQRT_INNER (11);
    ISQRT_INNER (10);
    ISQRT_INNER ( 9);
    ISQRT_INNER ( 8);
    ISQRT_INNER ( 7);
    ISQRT_INNER ( 6);
    ISQRT_INNER ( 5);
    ISQRT_INNER ( 4);
    ISQRT_INNER ( 3);
    ISQRT_INNER ( 2);
    ISQRT_INNER ( 1);
    ISQRT_INNER ( 0);

#undef ISQRT_INNER
    return root;
}

Of course there are both faster AND better ways to do this, but I think it's a pretty neat example of a pessimization.

当然,有更快更好的方法来做到这一点,但我认为这是一个非常巧妙的悲观化例子。

Edit: Come to think of it, the unrolled loop was actually also a neat pessimization. Digging though the version control, I can present the second stage of refactoring as well, which performed even better than the above:

编辑:想想看,展开的循环实际上也是一个整洁的悲观。通过版本控制挖掘,我也可以呈现第二阶段的重构,其表现甚至比上面更好:

unsigned long isqrt(unsigned long value)
{
    unsigned long tmp = 1 << 30, root = 0;

    while (tmp != 0)
    {
        if (value >= root + tmp) {
            value -= root + tmp;
            root += tmp << 1;
        }
        root >>= 1;
        tmp >>= 2;
    }

    return root;
}

This is exactly the same algorithm, albeit a slightly different implementation, so I suppose it qualifies.

这是完全相同的算法,虽然实现略有不同,所以我认为它符合条件。

#22

This might be at a higher level that what you were after, but fixing it (if you're allowed) also involves a higher level of pain:

这可能是你所追求的更高水平,但修复它(如果你被允许)也会带来更高的痛苦程度:

Insisting on hand rolling an Object Relationship Manager / Data Access Layer instead of using one of the established, tested, mature libraries out there (even after they've been pointed out to you).

坚持手动滚动对象关系管理器/数据访问层,而不是使用已建立的,经过测试的成熟库之一(即使它们已被指出给您)。

#23

All foreign-key constraints were removed from a database, because otherwise there would be so many errors.

所有外键约束都从数据库中删除,否则会出现很多错误。

#24

This doesn't exactly fit the question, but I'll mention it anyway a cautionary tale. I was working on a distributed app that was running slowly, and flew down to DC to sit in on a meeting primarily aimed at solving the problem. The project lead started to outline a re-architecture aimed at resolving the delay. I volunteered that I had taken some measurements over the weekend that isolated the bottleneck to a single method. It turned out there was a missing record on a local lookup, causing the application to have to go to a remote server on every transaction. By adding the record back to the local store, the delay was eliminated - problem solved. Note the re-architecture wouldn't have fixed the problem.

这不完全适合这个问题,但无论如何我都会提到一个警示故事。我正在研究一个运行缓慢的分布式应用程序,然后飞到DC参加主要旨在解决问题的会议。项目负责人开始勾勒出旨在解决延迟问题的重新架构。我自告奋勇说我周末采取了一些测量方法,将瓶颈分离为单一方法。事实证明,本地查找中缺少记录,导致应用程序必须在每次事务处都转到远程服务器。通过将记录添加回本地商店,延迟被消除 - 问题得以解决。请注意,重新架构不会解决问题。

#25

Checking before EVERY javascript operation whether the object you are operating upon exists.

在每次javascript操作之前检查是否存在您正在操作的对象。

if (myObj) { //or its evil cousin, if (myObj != null) {
    label.text = myObj.value; 
    // we know label exists because it has already been 
    // checked in a big if block somewhere at the top
}

My problem with this type of code is nobody seems to care what if it doesn't exist? Just do nothing? Don't give the feedback to the user?

我对这类代码的问题是,如果它不存在,似乎没有人关心它是什么?什么都不做?不要向用户提供反馈?

I agree that the Object expected errors are annoying, but this is not the best solution for that.

我同意Object预期的错误很烦人,但这不是最好的解决方案。

#26

How about YAGNI extremism. It is a form of premature pessimization. It seems like anytime you apply YAGNI, then you end up needing it, resulting in 10 times the effort to add it than if you had added it in the beginning. If you create a successful program then odds are YOU ARE GOING TO NEED IT. If you are used to creating programs whose life runs out quickly then continue to practice YAGNI because then I suppose YAGNI.

YAGNI*怎么样?这是一种过早悲观化的形式。似乎你在任何时候申请YAGNI,然后你最终需要它,导致添加它的努力比你在开始时添加它要多10倍。如果你创建一个成功的程序,那么你可能需要它。如果你习惯于创造生命快速耗尽的程序,那么继续练习YAGNI因为那时我想YAGNI。

#27

Not exactly premature optimisation - but certainly misguided - this was read on the BBC website, from an article discussing Windows 7.

不完全是过早优化 - 但肯定是错误的 - 这是在BBC网站上从一篇讨论Windows 7的文章中读到的。

Mr Curran said that the Microsoft Windows team had been poring over every aspect of the operating system to make improvements. "We were able to shave 400 milliseconds off the shutdown time by slightly trimming the WAV file shutdown music.

Curran先生表示,Microsoft Windows团队一直在研究操作系统的各个方面以进行改进。 “通过略微修剪WAV文件关机音乐,我们能够在关机时间内缩短400毫秒。

Now, I haven't tried Windows 7 yet, so I might be wrong, but I'm willing to bet that there are other issues in there that are more important than how long it takes to shut-down. After all, once I see the 'Shutting down Windows' message, the monitor is turned off and I'm walking away - how does that 400 milliseconds benefit me?

现在,我还没有尝试过Windows 7,所以我可能错了,但我愿意打赌,其中有一些问题比关闭需要多长时间更重要。毕竟,一旦我看到“关闭Windows”消息,显示器就会关闭,我正在走开 - 400毫秒的时间对我有什么好处?

#28

Someone in my department once wrote a string class. An interface like CString, but without the Windows dependence.

我部门的某个人曾经写过一个字符串类。像CString这样的接口,但没有Windows依赖。

One "optimization" they did was to not allocate any more memory than necessary. Apparently not realizing that the reason classes like std::string do allocate excess memory is so that a sequence of += operations can run in O(n) time.

他们所做的一个“优化”是不分配超过必要的内存。显然没有意识到类似std :: string这样的类会分配多余的内存,因此+ =操作序列可以在O(n)时间内运行。

Instead, every single += call forced a reallocation, which turned repeated appends into an O(n²) Schlemiel the Painter's algorithm.

相反,每一个+ =呼叫强制重新分配,重复转换为O(n²)施莱米尔画家的算法。

#29

An ex-coworker of mine (a s.o.a.b., actually) was assigned to build a new module for our Java ERP that should have collected and analyzed customers' data (retail industry). He decided to split EVERY Calendar/Datetime field in its components (seconds, minutes, hours, day, month, year, day of week, bimester, trimester (!)) because "how else would I query for 'every monday'?"

我的一位前同事(实际上是s.o.a.b.)被指派为我们的Java ERP构建一个新模块,该模块应该收集和分析客户的数据(零售业)。他决定在其组件中分割每个日历/日期时间字段(秒,分钟,小时,日,月,年,星期几,bimester,三个月(!)),因为“我将如何查询'每个星期一'?”

#30

No offense to anyone, but I just graded an assignment (java) that had this

对任何人都没有冒犯,但我只是给了一个有这个的任务(java)

import java.lang.*;

#1