Getting cache info using C/C++ with inline assembly/intrinsics in osx

时间:2022-10-31 15:13:36

I wrote the following program using both gcc __get_cpuid and inline assembly to get the cache info of my laptop but fail to identify them on the table about (Encoding of Cache and TLB Descriptors) I found online.

我使用gcc __get_cpuid和内联汇编编写了以下程序来获取我的笔记本电脑的缓存信息,但无法在桌面上识别它们(我在网上找到的缓存和TLB描述符的编码)。

#include <stdio.h>
#include <stdlib.h> 
#include <assert.h>
#include <string.h>
#include <time.h>
#include <stdint.h>
#include <math.h>
#include <cpuid.h>

static inline void cpuid(uint32_t *eax, uint32_t *ebx,
                         uint32_t *ecx, uint32_t *edx);

int main() {
    uint32_t a, b, c, d;
    uint32_t eax, ebx, ecx, edx;
    eax = 2; /* processor info and feature bits */
    uint32_t command = 2;
    cpuid(&eax, &ebx, &ecx, &edx);
    __get_cpuid(command, &a, &b, &c, &d);

    printf("eax: %08x\n", eax);
    printf("ebx: %08x\n", ebx);
    printf("ecx: %08x\n", ecx);
    printf("edx: %08x\n", edx);

    printf("a: %08x\n", a);
    printf("b: %08x\n", b);
    printf("c: %08x\n", c);
    printf("d: %08x\n", d);

static inline void cpuid(uint32_t *eax, uint32_t *ebx,
                         uint32_t *ecx, uint32_t *edx)
        /* ecx is often an input as well as an output. */
        asm ("cpuid"
            : "=a" (*eax),
              "=b" (*ebx),
              "=c" (*ecx),
              "=d" (*edx)
            : "0" (*eax));

my output:


eax: 76036301
ebx: 00f0b5ff
ecx: 00000000
edx: 00c10000
a: 76036301
b: 00f0b5ff
c: 00000000
d: 00c10000

I found this table from here Getting cache info using C/C++ with inline assembly/intrinsics in osx


I use sysctl hw.cachesize and find that

我使用sysctl hw.cachesize并找到它

L1 cache: 32KB
L2 cache: 256KB
L3 cache: 6MB

My Environment:


system: os x 10.10.1
compiler: clang-602.0.53
CPU: I7-4850 HQ 2.3HZ

What's wrong with my program? My program should work since both methods give the same result... I am confused about this. Thank you!


EDIT: I try what Mats' suggested and get the following as my output:


gcc intrinsic
a: 76036301
b: 00f0b5ff
c: 00000000
d: 00c10000
eax: 2
eax: 76036301
ebx: 00f0b5ff
ecx: 00000000
edx: 00c10000
eax: 4, ecx: 0
eax: 1c004121
ebx: 01c0003f
ecx: 0000003f
edx: 00000000
eax: 4, ecx: 1
eax: 1c004122
ebx: 01c0003f
ecx: 0000003f
edx: 00000000
eax: 4, ecx: 2
eax: 1c004143
ebx: 01c0003f
ecx: 000001ff
edx: 00000000
eax: 4, ecx: 3
eax: 1c03c163
ebx: 02c0003f
ecx: 00001fff
edx: 00000006
eax: 4, ecx: 4
eax: 1c03c183
ebx: 03c0f03f
ecx: 00001fff
edx: 00000004
eax: 4, ecx: 5
eax: 00000000
ebx: 00000000
ecx: 00000000
edx: 00000000

I look up the table at here
static cpuid_cache_descriptor_t intel_cpuid_leaf2_descriptor_table[] = {

我在这里查看表静态cpuid_cache_descriptor_t intel_cpuid_leaf2_descriptor_table [] = {

//  -------------------------------------------------------
//  value   type    level       ways    size    entries
//  -------------------------------------------------------
    { 0x00, _NULL_, NA,     NA, NA, NA  },
    { 0x01, TLB,    INST,       4,  SMALL,  32  },  
    { 0x02, TLB,    INST,       FULLY,  LARGE,  2   },  
    { 0x03, TLB,    DATA,       4,  SMALL,  64  },  
    { 0x04, TLB,    DATA,       4,  LARGE,  8   },  
    { 0x05, TLB,    DATA1,      4,  LARGE,  32  },  
    { 0x06, CACHE,  L1_INST,    4,  8*K,    32  },
    { 0x08, CACHE,  L1_INST,    4,  16*K,   32  },
    { 0x09, CACHE,  L1_INST,    4,  32*K,   64  },
    { 0x0A, CACHE,  L1_DATA,    2,  8*K,    32  },
    { 0x0B, TLB,    INST,       4,  LARGE,  4   },  
    { 0x0C, CACHE,  L1_DATA,    4,  16*K,   32  },
    { 0x0D, CACHE,  L1_DATA,    4,  16*K,   64  },
    { 0x0E, CACHE,  L1_DATA,    6,  24*K,   64  },
    { 0x21, CACHE,  L2,     8,  256*K,  64  },
    { 0x22, CACHE,  L3_2LINESECTOR, 4,  512*K,  64  },
    { 0x23, CACHE,  L3_2LINESECTOR, 8,  1*M,    64  },
    { 0x25, CACHE,  L3_2LINESECTOR, 8,  2*M,    64  },
    { 0x29, CACHE,  L3_2LINESECTOR, 8,  4*M,    64  },
    { 0x2C, CACHE,  L1_DATA,    8,  32*K,   64  },
    { 0x30, CACHE,  L1_INST,    8,  32*K,   64  },
    { 0x40, CACHE,  L2,     NA, 0,  NA  },
    { 0x41, CACHE,  L2,     4,  128*K,  32  },
    { 0x42, CACHE,  L2,     4,  256*K,  32  },
    { 0x43, CACHE,  L2,     4,  512*K,  32  },
    { 0x44, CACHE,  L2,     4,  1*M,    32  },
    { 0x45, CACHE,  L2,     4,  2*M,    32  },
    { 0x46, CACHE,  L3,     4,  4*M,    64  },
    { 0x47, CACHE,  L3,     8,  8*M,    64  },
    { 0x48, CACHE,  L2,     12,     3*M,    64  },
    { 0x49, CACHE,  L2,     16, 4*M,    64  },
    { 0x4A, CACHE,  L3,     12,     6*M,    64  },
    { 0x4B, CACHE,  L3,     16, 8*M,    64  },
    { 0x4C, CACHE,  L3,     12,     12*M,   64  },
    { 0x4D, CACHE,  L3,     16, 16*M,   64  },
    { 0x4E, CACHE,  L2,     24, 6*M,    64  },
    { 0x4F, TLB,    INST,       NA, SMALL,  32  },  
    { 0x50, TLB,    INST,       NA, BOTH,   64  },  
    { 0x51, TLB,    INST,       NA, BOTH,   128 },  
    { 0x52, TLB,    INST,       NA, BOTH,   256 },  
    { 0x55, TLB,    INST,       FULLY,  BOTH,   7   },  
    { 0x56, TLB,    DATA0,      4,  LARGE,  16  },  
    { 0x57, TLB,    DATA0,      4,  SMALL,  16  },  
    { 0x59, TLB,    DATA0,      FULLY,  SMALL,  16  },  
    { 0x5A, TLB,    DATA0,      4,  LARGE,  32  },  
    { 0x5B, TLB,    DATA,       NA, BOTH,   64  },  
    { 0x5C, TLB,    DATA,       NA, BOTH,   128 },  
    { 0x5D, TLB,    DATA,       NA, BOTH,   256 },  
    { 0x60, CACHE,  L1,     16*K,   8,  64  },
    { 0x61, CACHE,  L1,     4,  8*K,    64  },
    { 0x62, CACHE,  L1,     4,  16*K,   64  },
    { 0x63, CACHE,  L1,     4,  32*K,   64  },
    { 0x70, CACHE,  TRACE,      8,  12*K,   NA  },
    { 0x71, CACHE,  TRACE,      8,  16*K,   NA  },
    { 0x72, CACHE,  TRACE,      8,  32*K,   NA  },
    { 0x78, CACHE,  L2,     4,  1*M,    64  },
    { 0x79, CACHE,  L2_2LINESECTOR, 8,  128*K,  64  },
    { 0x7A, CACHE,  L2_2LINESECTOR, 8,  256*K,  64  },
    { 0x7B, CACHE,  L2_2LINESECTOR, 8,  512*K,  64  },
    { 0x7C, CACHE,  L2_2LINESECTOR, 8,  1*M,    64  },
    { 0x7D, CACHE,  L2,     8,  2*M,    64  },
    { 0x7F, CACHE,  L2,     2,  512*K,  64  },
    { 0x80, CACHE,  L2,     8,  512*K,  64  },
    { 0x82, CACHE,  L2,     8,  256*K,  32  },
    { 0x83, CACHE,  L2,     8,  512*K,  32  },
    { 0x84, CACHE,  L2,     8,  1*M,    32  },
    { 0x85, CACHE,  L2,     8,  2*M,    32  },
    { 0x86, CACHE,  L2,     4,  512*K,  64  },
    { 0x87, CACHE,  L2,     8,  1*M,    64  },
    { 0xB0, TLB,    INST,       4,  SMALL,  128 },  
    { 0xB1, TLB,    INST,       4,  LARGE,  8   },  
    { 0xB2, TLB,    INST,       4,  SMALL,  64  },  
    { 0xB3, TLB,    DATA,       4,  SMALL,  128 },  
    { 0xB4, TLB,    DATA1,      4,  SMALL,  256 },  
    { 0xBA, TLB,    DATA1,      4,  BOTH,   64  },  
    { 0xCA, STLB,   DATA1,      4,  BOTH,   512 },  
    { 0xD0, CACHE,  L3,     4,  512*K,  64  },  
    { 0xD1, CACHE,  L3,     4,  1*M,    64  },  
    { 0xD2, CACHE,  L3,     4,  2*M,    64  },  
    { 0xD3, CACHE,  L3,     4,  4*M,    64  },  
    { 0xD4, CACHE,  L3,     4,  8*M,    64  },  
    { 0xD6, CACHE,  L3,     8,  1*M,    64  },  
    { 0xD7, CACHE,  L3,     8,  2*M,    64  },  
    { 0xD8, CACHE,  L3,     8,  4*M,    64  },  
    { 0xD9, CACHE,  L3,     8,  8*M,    64  },  
    { 0xDA, CACHE,  L3,     8,  12*M,   64  },  
    { 0xDC, CACHE,  L3,     12,     1536*K, 64  },  
    { 0xDD, CACHE,  L3,     12,     3*M,    64  },  
    { 0xDE, CACHE,  L3,     12,     6*M,    64  },  
    { 0xDF, CACHE,  L3,     12, 12*M,   64  },  
    { 0xE0, CACHE,  L3,     12, 18*M,   64  },  
    { 0xE2, CACHE,  L3,     16, 2*M,    64  },  
    { 0xE3, CACHE,  L3,     16, 4*M,    64  },  
    { 0xE4, CACHE,  L3,     16, 8*M,    64  },  
    { 0xE5, CACHE,  L3,     16, 16*M,   64  },  
    { 0xE6, CACHE,  L3,     16, 24*M,   64  },  
    { 0xF0, PREFETCH, NA,       NA, 64, NA  },  
    { 0xF1, PREFETCH, NA,       NA, 128,    NA  }   

The problem right now is that I still cannot get the correct size of my L3 cache(when ecx=1, I get 22 i.e. 512K, but the correct value is 6MB). Also, there seems to be some conflicts in terms of the size of my L2 cache(43(when ecx=2) and 21(when ecx=0) )

现在的问题是我仍然无法获得正确的L3缓存大小(当ecx = 1时,我得到22即512K,但正确的值是6MB)。此外,我的L2缓存大小(43(当ecx = 2时)和21(当ecx = 0时)似乎存在一些冲突)

2 个解决方案



So, your data seems to be reasonably correct, just that you are using an old reference. Unfortunately, Intel's website is either broken presently or it doesn't like Firefox and/or Linux.




76 means trace cache with 64K ops.


03 means 4 way DATA TLB with 64 entries.

03表示具有64个条目的4路DATA TLB。

63 is 32KB L1 cache - the source here shows that value, which is not in your docs.

63是32KB L1缓存 - 这里的源显示该值,这不在您的文档中。

01 means 4 way Instruction TLB with 32 entries.


00f0b5ff gives


00 "nothing"


f0 prefetch, 64 entries.


0b Instruction 4 way TLB for large pages, 4 entries.


b5 is not documented even on that link. [guessing small data TLB]

即使在该链接上也没有记录b5。 [猜测小数据TLB]

To get L2 and L3 cache sizes, you need to use CPUID with EAX=4, and set ECX to 0, 1, 2, ... for each caching level. The linked code shows this, and Intel's docs have details on which bits mean what.

要获得L2和L3高速缓存大小,您需要使用EID = 4的CPUID,并为每个高速缓存级别将ECX设置为0,1,2,....链接的代码显示了这一点,英特尔的文档详细说明了哪些位意味着什么。



Intel's Instruction Set Reference has all the relevant information you need (at around page 263), and is actually up to date unlike every other source I have found.


Probably the best way to get the cache info is mentioned in that reference.


When eax = 4 and ecx is the cache level,

当eax = 4且ecx是缓存级别时,

Ways = EBX[31:22]

Partitions = EBX[21:12]

LineSize = EBX[11:0]

Sets = ECX

Total Size = (Ways + 1) * (Partitions + 1) * (Line_Size + 1) * (Sets + 1)

So when CUPID is called with eax = 4 and ecx = 3, you can get your L3 cache size by doing the computation above. Using the OP's posted data:

因此,当使用eax = 4和ecx = 3调用CUPID时,您可以通过执行上面的计算来获得L3缓存大小。使用OP的发布数据:

ebx: 02c0003f
ecx: 00001fff

Ways = 63
Partitions = 0 
LineSize = 11
Sets = 8191

Total L3 cache size = 6291456

总L3缓存大小= 6291456

Which is what was expected.




So, your data seems to be reasonably correct, just that you are using an old reference. Unfortunately, Intel's website is either broken presently or it doesn't like Firefox and/or Linux.




76 means trace cache with 64K ops.


03 means 4 way DATA TLB with 64 entries.

03表示具有64个条目的4路DATA TLB。

63 is 32KB L1 cache - the source here shows that value, which is not in your docs.

63是32KB L1缓存 - 这里的源显示该值,这不在您的文档中。

01 means 4 way Instruction TLB with 32 entries.


00f0b5ff gives


00 "nothing"


f0 prefetch, 64 entries.


0b Instruction 4 way TLB for large pages, 4 entries.


b5 is not documented even on that link. [guessing small data TLB]

即使在该链接上也没有记录b5。 [猜测小数据TLB]

To get L2 and L3 cache sizes, you need to use CPUID with EAX=4, and set ECX to 0, 1, 2, ... for each caching level. The linked code shows this, and Intel's docs have details on which bits mean what.

要获得L2和L3高速缓存大小,您需要使用EID = 4的CPUID,并为每个高速缓存级别将ECX设置为0,1,2,....链接的代码显示了这一点,英特尔的文档详细说明了哪些位意味着什么。



Intel's Instruction Set Reference has all the relevant information you need (at around page 263), and is actually up to date unlike every other source I have found.


Probably the best way to get the cache info is mentioned in that reference.


When eax = 4 and ecx is the cache level,

当eax = 4且ecx是缓存级别时,

Ways = EBX[31:22]

Partitions = EBX[21:12]

LineSize = EBX[11:0]

Sets = ECX

Total Size = (Ways + 1) * (Partitions + 1) * (Line_Size + 1) * (Sets + 1)

So when CUPID is called with eax = 4 and ecx = 3, you can get your L3 cache size by doing the computation above. Using the OP's posted data:

因此,当使用eax = 4和ecx = 3调用CUPID时,您可以通过执行上面的计算来获得L3缓存大小。使用OP的发布数据:

ebx: 02c0003f
ecx: 00001fff

Ways = 63
Partitions = 0 
LineSize = 11
Sets = 8191

Total L3 cache size = 6291456

总L3缓存大小= 6291456

Which is what was expected.
