在AppKit中测量文本宽度的性能

时间:2022-07-25 04:02:33

Is there a way in AppKit to measure the width of a large number of NSString objects(say a million) really fast? I have tried 3 different ways to do this:

在AppKit中有没有办法快速测量大量NSString对象(比如一百万)的宽度?我尝试了3种不同的方法:

  • [NSString sizeWithAttributes:]
  • [NSAttributedString size]
  • NSLayoutManager (get text width instead of height)

    Here are some performance metrics

    Count\Mechanism    sizeWithAttributes    NSAttributedString    NSLayoutManager
    1000               0.057                 0.031                 0.007
    10000              0.329                 0.325                 0.064
    100000             3.06                  3.14                  0.689
    1000000            29.5                  31.3                  7.06



    NSLayoutManager is clearly the way to go, but the problem being

    NSLayoutManager显然是要走的路,但问题在于

  • High memory footprint(more than 1GB according to profiler) because of the creation of heavyweight NSTextStorage objects.
  • 由于创建了重量级的NSTextStorage对象,因此内存占用空间很大(根据分析器而超过1GB)。

  • High creation time. All of the time taken is during creation of the above strings, which is a dealbreaker in itself.(subsequently measuring NSTextStorage objects which have glyphs created and laid out only takes about 0.0002 seconds).
  • 创作时间长。所有花费的时间都是在创建上述字符串的过程中,这本身就是一个破坏者。(随后测量NSTextStorage对象,其中创建和布置的字形只需要大约0.0002秒)。

  • 7 seconds is still too slow for what I am trying to do. Is there a faster way? To measure a million strings in about a second?

    In case you want to play around, Here is the github project.

    如果你想玩,这是github项目。

  • 1 个解决方案

    #1


    3  

    Here are some ideas I haven't tried.

    这是我没有尝试过的一些想法。

    1. Use Core Text directly. The other APIs are built on top of it.

      直接使用Core Text。其他API建立在它之上。

    2. Parallelize. All modern Macs (and even all modern iOS devices) have multiple cores. Divide up the string array into several subarrays. For each subarray, submit a block to a global GCD queue. In the block, create the necessary Core Text or NSLayoutManager objects and measure the strings in the subarray. Both APIs can be used safely this way. (Core Text) (NSLayoutManager)

      并行。所有现代Mac(甚至所有现代iOS设备)都有多个内核。将字符串数组分成几个子数组。对于每个子数组,将块提交到全局GCD队列。在块中,创建必要的Core Text或NSLayoutManager对象并测量子数组中的字符串。这两种API都可以通过这种方式安全使用。 (核心文本)(NSLayoutManager)

    3. Regarding “High memory footprint”: Use Local Autorelease Pool Blocks to Reduce Peak Memory Footprint.

      关于“高内存占用”:使用本地自动释放池块来减少峰值内存占用。

    4. Regarding “All of the time taken is during creation of the above strings, which is a dealbreaker in itself”: Are you saying all the time is spent in these lines:

      关于“所有花费的时间都是在创建上述字符串期间,这本身就是一个破坏者”:你是说所有的时间花在这些方面:

      double random = (double)arc4random_uniform(1000) / 1000;
      NSString *randomNumber = [NSString stringWithFormat:@"%f", random];
      

      Formatting a floating-point number is expensive. Is this your real use case? If you just want to format a random rational of the form n/1000 for 0 ≤ n < 1000, there are faster ways. Also, in many fonts, all digits have the same width, so that it's easy to typeset columns of numbers. If you pick such a font, you can avoid measuring the strings in the first place.

      格式化浮点数很昂贵。这是你的真实用例吗?如果你只是想格式化n / 1000形式的随机有理数为0≤n<1000,那么有更快的方法。此外,在许多字体中,所有数字都具有相同的宽度,因此很容易排版数字列。如果选择这样的字体,则可以避免首先测量字符串。

    UPDATE

    Here's the fastest code I've come up with using Core Text. The dispatched version is almost twice as fast as the single-threaded version on my Core i7 MacBook Pro. My fork of your project is here.

    这是我使用Core Text提出的最快的代码。发送的版本几乎是我的Core i7 MacBook Pro上单线程版本的两倍。我的项目分支就在这里。

    static CGFloat maxWidthOfStringsUsingCTFramesetter(NSArray *strings, NSRange range) {
        NSString *bigString = [[strings subarrayWithRange:range] componentsJoinedByString:@"\n"];
        NSAttributedString *richText = [[NSAttributedString alloc] initWithString:bigString attributes:@{ NSFontAttributeName: (__bridge NSFont *)font }];
        CGPathRef path = CGPathCreateWithRect(CGRectMake(0, 0, CGFLOAT_MAX, CGFLOAT_MAX), NULL);
        CGFloat width = 0.0;
        CTFramesetterRef setter = CTFramesetterCreateWithAttributedString((__bridge CFAttributedStringRef)richText);
        CTFrameRef frame = CTFramesetterCreateFrame(setter, CFRangeMake(0, bigString.length), path, NULL);
        NSArray *lines = (__bridge NSArray *)CTFrameGetLines(frame);
        for (id item in lines) {
            CTLineRef line = (__bridge CTLineRef)item;
            width = MAX(width, CTLineGetTypographicBounds(line, NULL, NULL, NULL));
        }
        CFRelease(frame);
        CFRelease(setter);
        CFRelease(path);
        return (CGFloat)width;
    }
    
    static void test_CTFramesetter() {
        runTest(__func__, ^{
            return maxWidthOfStringsUsingCTFramesetter(testStrings, NSMakeRange(0, testStrings.count));
        });
    }
    
    static void test_CTFramesetter_dispatched() {
        runTest(__func__, ^{
            dispatch_queue_t gatherQueue = dispatch_queue_create("test_CTFramesetter_dispatched result-gathering queue", nil);
            dispatch_queue_t runQueue = dispatch_get_global_queue(QOS_CLASS_UTILITY, 0);
            dispatch_group_t group = dispatch_group_create();
    
            __block CGFloat gatheredWidth = 0.0;
    
            const size_t Parallelism = 16;
            const size_t totalCount = testStrings.count;
            // Force unsigned long to get 64-bit math to avoid overflow for large totalCounts.
            for (unsigned long i = 0; i < Parallelism; ++i) {
                NSUInteger start = (totalCount * i) / Parallelism;
                NSUInteger end = (totalCount * (i + 1)) / Parallelism;
                NSRange range = NSMakeRange(start, end - start);
                dispatch_group_async(group, runQueue, ^{
                    double width = maxWidthOfStringsUsingCTFramesetter(testStrings, range);
                    dispatch_sync(gatherQueue, ^{
                        gatheredWidth = MAX(gatheredWidth, width);
                    });
                });
            }
    
            dispatch_group_wait(group, DISPATCH_TIME_FOREVER);
    
            return gatheredWidth;
        });
    }
    

    #1


    3  

    Here are some ideas I haven't tried.

    这是我没有尝试过的一些想法。

    1. Use Core Text directly. The other APIs are built on top of it.

      直接使用Core Text。其他API建立在它之上。

    2. Parallelize. All modern Macs (and even all modern iOS devices) have multiple cores. Divide up the string array into several subarrays. For each subarray, submit a block to a global GCD queue. In the block, create the necessary Core Text or NSLayoutManager objects and measure the strings in the subarray. Both APIs can be used safely this way. (Core Text) (NSLayoutManager)

      并行。所有现代Mac(甚至所有现代iOS设备)都有多个内核。将字符串数组分成几个子数组。对于每个子数组,将块提交到全局GCD队列。在块中,创建必要的Core Text或NSLayoutManager对象并测量子数组中的字符串。这两种API都可以通过这种方式安全使用。 (核心文本)(NSLayoutManager)

    3. Regarding “High memory footprint”: Use Local Autorelease Pool Blocks to Reduce Peak Memory Footprint.

      关于“高内存占用”:使用本地自动释放池块来减少峰值内存占用。

    4. Regarding “All of the time taken is during creation of the above strings, which is a dealbreaker in itself”: Are you saying all the time is spent in these lines:

      关于“所有花费的时间都是在创建上述字符串期间,这本身就是一个破坏者”:你是说所有的时间花在这些方面:

      double random = (double)arc4random_uniform(1000) / 1000;
      NSString *randomNumber = [NSString stringWithFormat:@"%f", random];
      

      Formatting a floating-point number is expensive. Is this your real use case? If you just want to format a random rational of the form n/1000 for 0 ≤ n < 1000, there are faster ways. Also, in many fonts, all digits have the same width, so that it's easy to typeset columns of numbers. If you pick such a font, you can avoid measuring the strings in the first place.

      格式化浮点数很昂贵。这是你的真实用例吗?如果你只是想格式化n / 1000形式的随机有理数为0≤n<1000,那么有更快的方法。此外,在许多字体中,所有数字都具有相同的宽度,因此很容易排版数字列。如果选择这样的字体,则可以避免首先测量字符串。

    UPDATE

    Here's the fastest code I've come up with using Core Text. The dispatched version is almost twice as fast as the single-threaded version on my Core i7 MacBook Pro. My fork of your project is here.

    这是我使用Core Text提出的最快的代码。发送的版本几乎是我的Core i7 MacBook Pro上单线程版本的两倍。我的项目分支就在这里。

    static CGFloat maxWidthOfStringsUsingCTFramesetter(NSArray *strings, NSRange range) {
        NSString *bigString = [[strings subarrayWithRange:range] componentsJoinedByString:@"\n"];
        NSAttributedString *richText = [[NSAttributedString alloc] initWithString:bigString attributes:@{ NSFontAttributeName: (__bridge NSFont *)font }];
        CGPathRef path = CGPathCreateWithRect(CGRectMake(0, 0, CGFLOAT_MAX, CGFLOAT_MAX), NULL);
        CGFloat width = 0.0;
        CTFramesetterRef setter = CTFramesetterCreateWithAttributedString((__bridge CFAttributedStringRef)richText);
        CTFrameRef frame = CTFramesetterCreateFrame(setter, CFRangeMake(0, bigString.length), path, NULL);
        NSArray *lines = (__bridge NSArray *)CTFrameGetLines(frame);
        for (id item in lines) {
            CTLineRef line = (__bridge CTLineRef)item;
            width = MAX(width, CTLineGetTypographicBounds(line, NULL, NULL, NULL));
        }
        CFRelease(frame);
        CFRelease(setter);
        CFRelease(path);
        return (CGFloat)width;
    }
    
    static void test_CTFramesetter() {
        runTest(__func__, ^{
            return maxWidthOfStringsUsingCTFramesetter(testStrings, NSMakeRange(0, testStrings.count));
        });
    }
    
    static void test_CTFramesetter_dispatched() {
        runTest(__func__, ^{
            dispatch_queue_t gatherQueue = dispatch_queue_create("test_CTFramesetter_dispatched result-gathering queue", nil);
            dispatch_queue_t runQueue = dispatch_get_global_queue(QOS_CLASS_UTILITY, 0);
            dispatch_group_t group = dispatch_group_create();
    
            __block CGFloat gatheredWidth = 0.0;
    
            const size_t Parallelism = 16;
            const size_t totalCount = testStrings.count;
            // Force unsigned long to get 64-bit math to avoid overflow for large totalCounts.
            for (unsigned long i = 0; i < Parallelism; ++i) {
                NSUInteger start = (totalCount * i) / Parallelism;
                NSUInteger end = (totalCount * (i + 1)) / Parallelism;
                NSRange range = NSMakeRange(start, end - start);
                dispatch_group_async(group, runQueue, ^{
                    double width = maxWidthOfStringsUsingCTFramesetter(testStrings, range);
                    dispatch_sync(gatherQueue, ^{
                        gatheredWidth = MAX(gatheredWidth, width);
                    });
                });
            }
    
            dispatch_group_wait(group, DISPATCH_TIME_FOREVER);
    
            return gatheredWidth;
        });
    }