I've always heard and searched for new php 'good writing practice', for example: It's better (for performance) to check if array key exists than search in array, but also it seems better for memory too:
我总是听说并搜索新的php“良好的编写实践”,例如:检查数组键是否存在比在数组中搜索更好(对于性能),但对于内存来说似乎也更好:
Assuming we have:
假设我们有:
$array = array
(
'one' => 1,
'two' => 2,
'three' => 3,
'four' => 4,
);
this allocates 1040 bytes of memory,
分配1040字节的内存,
and
和
$array = array
(
1 => 'one',
2 => 'two',
3 => 'three',
4 => 'four',
);
requires 1136 bytes
需要1136字节
I understand that the key
and value
surely will have different storing mechanism, but please can you actually point me to the principle how does it work?
我知道键和值肯定会有不同的存储机制,但你能不能给我指出它是如何工作的?
Example 2 (for @teuneboon):
例2(@teuneboon):
$array = array
(
'one' => '1',
'two' => '2',
'three' => '3',
'four' => '4',
);
1168 bytes
1168个字节
$array = array
(
'1' => 'one',
'2' => 'two',
'3' => 'three',
'4' => 'four',
);
1136 bytes
1136个字节
consuming same memory:
使用相同的内存:
4 => 'four',
- 4 = > ' 4 ',
'4' => 'four',
- ' 4 ' = > ' 4 ',
4 个解决方案
#1
22
Note, answer below is applicable for PHP prior to version 7 as in PHP 7 major changes were introduced which also involve values structures.
注意,下面的答案适用于在版本7之前的PHP,因为在PHP 7中引入了主要的更改,其中也涉及值结构。
TL;DR
Your question is not actually about "how memory works in PHP" (here, I assume, you meant "memory allocation"), but about "how arrays work in PHP" - and these two questions are different. To summarize what's written below:
您的问题实际上不是关于“内存在PHP中如何工作”(这里我假设您指的是“内存分配”),而是关于“数组在PHP中如何工作”——这两个问题是不同的。总结以下内容:
- PHP arrays aren't "arrays" in classical sense. They are hash-maps
- PHP数组不是传统意义上的“数组”。他们是散列表
- Hash-map for PHP array has specific structure and uses many additional storage things, such as internal links pointers
- PHP数组的hashmap具有特定的结构,并使用许多额外的存储内容,如内部链接指针
- Hash-map items for PHP hash-map also use additional fields to store information. And - yes, not only string/integer keys matters, but also what are strings themselves, which are used for your keys.
- PHP hashmap项也使用其他字段来存储信息。是的,不仅字符串/整型键很重要,字符串本身也很重要,它们是用来做键的。
- Option with string keys in your case will "win" in terms of memory amount because both options will be hashed into
ulong
(unsigned long) keys hash-map, so real difference will be in values, where string-keys option has integer (fixed-length) values, while integer-keys option has strings (chars-dependent length) values. But that may not always will be true due to possible collisions. - 在您的情况下,使用字符串键的选项将在内存数量上“胜出”,因为这两个选项都将被散列到ulong(未签名的long)键hashmap中,因此真正的差异将是在值中,其中string-keys选项具有整数(固定长度)值,而整数-key选项有字符串(与字符相关的长度)值。但由于可能发生的碰撞,这可能并不总是正确的。
- "String-numeric" keys, such as
'4'
, will be treated as integer keys and translated into integer hash result as it was integer key. Thus,'4'=>'foo'
and4 => 'foo'
are same things. - “字符串-数字”键(如'4')将被视为整数键,并转换为整数哈希结果,因为它是整数键。因此,'4'=>'foo'和4 =>'foo'都是相同的东西。
Also, important note: the graphics here are copyright of PHP internals book
另外,需要注意的是:这里的图形是PHP内部书籍的版权
Hash-map for PHP arrays
PHP arrays and C arrays
PHP数组和C数组
You should realize one very important thing: PHP is written on C, where such things as "associative array" simply does not exist. So, in C "array" is exactly what "array" is - i.e. it's just a consecutive area in memory which can be accessed by a consecutive offset. Your "keys" may be only numeric, integer and only consecutive, starting from zero. You can't have, for instance, 3
,-6
,'foo'
as your "keys" there.
您应该认识到一件非常重要的事情:PHP是用C编写的,在这里,诸如“关联数组”之类的东西根本不存在。因此,在C中“数组”就是“数组”——也就是说,它只是内存中的一个连续区域,可以被一个连续的偏移量访问。您的“键”可能只是数字、整数和连续的,从0开始。例如,你不能把3,-6,“foo”作为你的“键”。
So to implement arrays, which are in PHP, there's hash-map option, it uses hash-function to hash your keys and transform them to integers, which can be used for C-arrays. That function, however, will never be able to create a bijection between string keys and their integer hashed results. And it's easy to understand why: because cardinality of strings set is much, much larger that cardinality of integer set. Let's illustrate with example: we'll recount all strings, up to length 10, which have only alphanumeric symbols (so, 0-9
, a-z
and A-Z
, total 62): it's 6210 total strings possible. It's around 8.39E+17. Compare it with around 4E+9 which we have for unsigned integer (long integer, 32-bits) type and you'll get the idea - there will be collisions.
要实现数组,在PHP中,有hash-map选项,它使用hash-function对键进行哈希并将它们转换为整数,这可以用于c -array。然而,该函数永远无法在字符串键和它们的整数散列结果之间创建双射。很容易理解为什么:因为字符串集的基数要比整数集的基数大得多。这是约8.39 e + 17。将它与无符号整数(长整数,32位)类型的4E+9进行比较,您就会明白这是一种冲突。
PHP hash-map keys & collisions
PHP hashmap键和冲突
Now, to resolve collisions, PHP will just place items, which have same hash-function result, into one linked list. So, hash-map would not be just "list of hashed elements", but instead it will store pointers to lists of elements (each element in certain list will have same hash-function key). And this is where you have point to how it will affect memory allocation: if your array has string keys, which did not result in collisions, then no additional pointers inside those list would be needed, so memory amount will be reduced (actually, it's a very small overhead, but, since we're talking about precise memory allocation, this should be taken to account). And, same way, if your string keys will result into many collisions, then more additional pointers would be created, so total memory amount will be a bit more.
现在,为了解决冲突,PHP将把具有相同hash-function结果的项放在一个链表中。因此,hashmap不会仅仅是“散列元素列表”,而是将存储指向元素列表的指针(特定列表中的每个元素都具有相同的hash-function键)。这就是你如何影响内存分配:如果你的数组字符串键,这并没有导致碰撞,然后不需要额外的指针在这些列表,所以内存数量将减少(实际上,它是一个非常小的开销,但是,因为我们讨论的是精确的内存分配,这应采取帐户)。同样,如果您的字符串键将导致许多冲突,那么将创建更多的指针,因此总内存将会多一点。
To illustrate those relations within those lists, here's a graphic:
为了说明这些列表之间的关系,这里有一个图表:
Above there is how PHP will resolve collisions after applying hash-function. So one of your question parts lies here, pointers inside collision-resolution lists. Also, elements of linked lists are usually called buckets and the array, which contains pointers to heads of those lists is internally called arBuckets
. Due to structure optimization (so, to make such things as element deletion, faster), real list element has two pointers, previous element and next element - but that's only will make difference in memory amount for non-collision/collision arrays little wider, but won't change concept itself.
上面是PHP在应用hash-function之后如何解决冲突。你们的问题之一就在这里,碰撞分辨率列表中的指针。此外,链表的元素通常被称为bucket,数组中包含指向这些列表头部的指针在内部被称为arbucket。由于结构优化(因此,为了使元素删除更快),real list元素有两个指针,前面的元素和下一个元素,但这只会使非碰撞/碰撞数组的内存数量差异更大,但不会改变概念本身。
One more list: order
一个列表:秩序
To fully support arrays as they are in PHP, it's also needed to maintain order, so that is achieved with another internal list. Each element of arrays is a member of that list too. It won't make difference in terms of memory allocation, since in both options this list should be maintained, but for full picture, I'm mentioning this list. Here's the graphic:
要完全支持PHP中的数组,还需要保持顺序,以便使用另一个内部列表实现这一点。数组的每个元素也是这个列表的一个成员。它不会对内存分配产生影响,因为在这两个选项中都应该维护这个列表,但是对于完整的图片,我要提到这个列表。图形:
In addition to pListLast
and pListNext
, pointers to order-list head and tail are stored. Again, it's not directly related to your question, but further I'll dump internal bucket structure, where these pointers are present.
除了pListLast和pListNext之外,还存储了指向订单列表头和尾部的指针。同样,它与您的问题没有直接关系,但是我将进一步转储内部bucket结构,这些指针在其中。
Array element from inside
数组元素在
Now we're ready to look into: what is array element, so, bucket:
现在我们来看看数组元素是什么,bucket:
typedef struct bucket {
ulong h;
uint nKeyLength;
void *pData;
void *pDataPtr;
struct bucket *pListNext;
struct bucket *pListLast;
struct bucket *pNext;
struct bucket *pLast;
char *arKey;
} Bucket;
Here we are:
在这里,我们有:
-
h
is an integer (ulong) value of key, it's a result of hash-function. For integer keys it is just same as key itself (hash-function returns itself) - h是键的整数(ulong)值,是哈希函数的结果。对于整数键,它与键本身一样(hash-function返回自身)
-
pNext
/pLast
are pointers inside collision-resolution linked list - pNext / pLast是冲突分辨率链接列表中的指针
-
pListNext
/pListLast
are pointers inside order-resolution linked list - pListNext/pListLast是订单解析链接列表中的指针
-
pData
is a pointer to the stored value. Actually, value isn't same as inserted at array creation, it's copy, but, to avoid unnecessary overhead, PHP usespDataPtr
(sopData = &pDataPtr
) - pData是指向存储值的指针。实际上,值与在数组创建时插入的值不同,它是复制的,但是为了避免不必要的开销,PHP使用pDataPtr(因此pData = &pDataPtr)
From this viewpoint, you may get next thing to where difference is: since string key will be hashed (thus, h
is always ulong
and, therefore, same size), it will be a matter of what is stored in values. So for your string-keys array there will be integer values, while for integer-keys array there will be string values, and that makes difference. However - no, it isn't a magic: you can't "save memory" with storing string keys such way all the times, because if your keys would be large and there will be many of them, it will cause collisions overhead (well, with very high probability, but, of course, not guaranteed). It will "work" only for arbitrary short strings, which won't cause many collisions.
从这个角度来看,您可能会遇到不同的地方:因为string键将被散列(因此,h总是ulong,因此,同样大小),这将是一个存储在值中的问题。所以对于字符串键数组,会有整型值,而对于整数键数组,会有字符串值,这是不同的。然而——不,这并不是什么神奇的事情:你不可能一直用这样的方式存储字符串键来“保存内存”,因为如果你的键很大,而且会有很多,那么它就会引起开销冲突(当然,这种情况的发生几率非常高,但这并不能保证)。它只适用于任意的短字符串,不会引起很多的冲突。
Hash-table itself
哈希表本身
It's already been spoken about elements (buckets) and their structure, but there's also hash-table itself, which is, in fact, array data-structure. So, it's called _hashtable
:
已经讨论过元素(bucket)和它们的结构,但是还有hashtable本身,实际上是数组数据结构。所以,它叫做_hashtable:
typedef struct _hashtable {
uint nTableSize;
uint nTableMask;
uint nNumOfElements;
ulong nNextFreeElement;
Bucket *pInternalPointer; /* Used for element traversal */
Bucket *pListHead;
Bucket *pListTail;
Bucket **arBuckets;
dtor_func_t pDestructor;
zend_bool persistent;
unsigned char nApplyCount;
zend_bool bApplyProtection;
#if ZEND_DEBUG
int inconsistent;
#endif
} HashTable;
I won't describe all the fields, since I've already provided much info, which is only related to the question, but I'll describe this structure briefly:
我不会描述所有的字段,因为我已经提供了很多信息,这些信息只与问题有关,但是我将简要地描述这个结构:
-
arBuckets
is what was described above, the buckets storage, - arbucket是上面描述的,存储空间,
-
pListHead
/pListTail
are pointers to order-resolution list - pListHead/pListTail是订单解析列表的指针
-
nTableSize
determines size of hash-table. And this is directly related to memory allocation:nTableSize
is always power of 2. Thus, it's no matter if you'll have 13 or 14 elements in array: actual size will be 16. Take that to account when you want to estimate array size. - nTableSize确定哈希表的大小。这与内存分配直接相关:nTableSize永远是2的幂。因此,无论数组中是否有13或14个元素,实际大小都是16。当您想估计数组大小时,请考虑到这一点。
Conclusion
It's really difficult to predict, will one array be larger than another in your case. Yes, there are guidelines which are following from internal structure, but if string keys are comparable by their length to integer values (like 'four'
, 'one'
in your sample) - real difference will be in such things as - how many collisions occurred, how many bytes were allocated to save the value.
很难预测,一个数组会比另一个数组大吗?是的,有指导方针后,从内部结构,但如果字符串键长度可比的整数值(像“4”,“一”在你的样品)——真正的区别将在诸如有多少碰撞发生时,分配给有多少字节保存价值。
But choosing proper structure should be matter of sense, not memory. If your intention is to build the corresponding indexed data, then choice always be obvious. Post above is only about one goal: to show how arrays actually work in PHP and where you can find the difference in memory allocation in your sample.
但是选择合适的结构应该是有意义的,而不是记忆。如果您的目的是构建相应的索引数据,那么选择总是显而易见的。上面的文章只是一个目标:展示数组在PHP中的实际工作方式,以及在示例中找到内存分配差异的地方。
You may also check article about arrays & hash-tables in PHP: it's Hash-tables in PHP by PHP internals book: I've used some graphics from there. Also, to realize, how values are allocated in PHP, check zval Structure article, it may help you to understand, what will be differences between strings & integers allocation for values of your arrays. I didn't include explanations from it here, since much more important point for me - is to show array data structure and what may be difference in context of string keys/integer keys for your question.
您还可以在PHP中查看关于数组和哈希表的文章:它是PHP的哈希表,由PHP内部人编写。另外,要了解如何在PHP中分配值,请参阅zval Structure文章,它可能有助于您理解,数组的值的字符串和整数分配之间有什么区别。这里我没有解释,因为对我来说更重要的一点是——显示数组数据结构以及字符串键/整型键在上下文中有什么不同。
#2
3
Although both arrays are accessed in a different way (i.e. via string or integer value), the memory pattern is mostly similar.
虽然两个数组都以不同的方式访问(例如通过字符串或整数值),但是内存模式基本上是相似的。
This is because the string allocation either happens as part of the zval creation or when a new array key needs to be allocated; the small difference being that numeric indices don't require a whole zval structure, because they're stored as an (unsigned) long.
这是因为字符串分配要么作为zval创建的一部分发生,要么需要分配一个新的数组键;小的区别在于数字索引不需要整个zval结构,因为它们是作为(无符号的)长存储的。
The observed differences in memory allocation are so minimal that they can be largely attributed to either the inaccuracy of memory_get_usage()
or allocations due to additional bucket creation.
观察到的内存分配差异是如此之小,以至于它们很大程度上可以归因于memory_get_usage()的不准确性,或者由于额外的桶创建而导致的分配。
Conclusion
How you want to use your array must be the guiding principle in choosing how it should be indexed; memory should only become an exception to this rule when you run out of it.
如何使用数组必须是选择如何索引数组的指导原则;只有当内存用完时,内存才会成为这个规则的一个例外。
#3
3
From PHP manual Garbage Collection http://php.net/manual/en/features.gc.php
从PHP手动垃圾收集http://php.net/manual/en/features.gc.php
gc_enable(); // Enable Garbage Collector
var_dump(gc_enabled()); // true
var_dump(gc_collect_cycles()); // # of elements cleaned up
gc_disable(); // Disable Garbage Collector
PHP does not return released memory very well; Its primary usage online does not require it and effective garbage collection takes time away from providing the output; When the script ends the memory is going to be returned anyway.
PHP不能很好地返回释放的内存;它的主要在线使用不需要它,有效的垃圾收集需要花费时间来提供输出;当脚本结束时,内存将被返回。
Garbage collection happens.
垃圾收集发生。
-
When you tell it to
当你告诉它
int gc_collect_cycles ( void )
int gc_collect_cycle (void)
-
When you leave a function
当你离开一个函数
- When the script ends
- 当脚本结束时,
Better understanding of PHP's Garbage collection from a web host, (no affiliation). http://www.sitepoint.com/better-understanding-phps-garbage-collection/
更好地理解来自web主机的PHP垃圾收集(无从属关系)。http://www.sitepoint.com/better-understanding-phps-garbage-collection/
If you are considering byte by byte how the data is set in memory. Different ports are going to effect those values. 64bit CPUs performance is best when data sits on the first bit of a 64bit word. For the max performance a specific binary they would allocate the start of a block of memory on the first bit, leaving up to 7 bytes unused. This CPU specific stuff depends on what compiler was used to compile the PHP.exe. I can not offer any way to predict exact memory usage, given that it will be determined differently by different compilers.
如果你考虑的是一个字节一个字节的数据如何在内存中设置。不同的端口会影响这些值。当数据位于64位字的第一个位时,64位cpu性能最好。为了最大限度地提高二进制文件的性能,它们会在第一个比特上分配一块内存的开始,最多留下7个字节未使用。这个特定于CPU的东西取决于用来编译PHP.exe的编译器。我不能提供任何方法来预测准确的内存使用情况,因为不同的编译器会以不同的方式确定它。
Alma Do, post goes to the specifics of the source which is sent to the compiler. What the PHP source requests and the compiler optimizes.
Alma Do, post指向发送给编译器的源的细节。PHP源请求和编译器优化的内容。
Looking at the specific examples you posted. When the key is a ascii letter they are taking 4 bytes (64 bits) more per entry ... this suggests to me, (assuming no garbage or memory holes, ect), that the ascii keys are greater than 64 bits, but the numeric keys are fit in a 64bit word. It suggests to me your using a 64bit computer and your PHP.exe is compiled for 64bit CPUs.
看看你发布的具体例子。当密钥是一个ascii码字母时,每个条目多占用4字节(64位)……这让我想到,(假设没有垃圾或内存漏洞,等等),ascii键大于64位,但是数字键适合64位字。我建议你使用64位电脑和PHP。exe编译为64位cpu。
#4
1
Arrays in PHP are implemented as hashmaps. Hence the length of the value you use for the key has little impact on the data requirement. In older versions of PHP there was a significant performance degradation with large arrays as the hash size was fixed at array creation - when collisions starting occurring then increasing numbers of hash values would map to linked lists of values which then had to be further searched (with an O(n) algorithm) instead of a single value, but more recently the hash appears to either use a much larger default size or is resized dynamically (it just works - I can't really be bothered reading the source code).
PHP中的数组实现为hashmap。因此,为键使用的值的长度对数据需求的影响很小。在旧版本的PHP有显著的性能退化与大型数组作为散列的大小是固定在创建数组,当碰撞开始出现越来越多的散列值会映射到链表的值然后必须进一步搜索(O(n)的算法),而不是一个单一的值,但最近,哈希似乎要么使用更大的默认值,要么动态调整大小(它只会工作——我实在懒得读源代码)。
Saving 4 bytes from your scripts is not going to cause Google any sleepless nights. If you are writing code which uses large arrays (where the savings may be more significant) you're probably doing it wrong - the time and resource taken to fill up the array could be better spent elsewhere (like indexed storage).
从脚本中节省4个字节不会导致谷歌任何不眠之夜。如果您正在编写使用大数组的代码(节省的空间可能更大),那么您可能做错了——填充数组所花费的时间和资源最好花在其他地方(比如索引存储)。
#1
22
Note, answer below is applicable for PHP prior to version 7 as in PHP 7 major changes were introduced which also involve values structures.
注意,下面的答案适用于在版本7之前的PHP,因为在PHP 7中引入了主要的更改,其中也涉及值结构。
TL;DR
Your question is not actually about "how memory works in PHP" (here, I assume, you meant "memory allocation"), but about "how arrays work in PHP" - and these two questions are different. To summarize what's written below:
您的问题实际上不是关于“内存在PHP中如何工作”(这里我假设您指的是“内存分配”),而是关于“数组在PHP中如何工作”——这两个问题是不同的。总结以下内容:
- PHP arrays aren't "arrays" in classical sense. They are hash-maps
- PHP数组不是传统意义上的“数组”。他们是散列表
- Hash-map for PHP array has specific structure and uses many additional storage things, such as internal links pointers
- PHP数组的hashmap具有特定的结构,并使用许多额外的存储内容,如内部链接指针
- Hash-map items for PHP hash-map also use additional fields to store information. And - yes, not only string/integer keys matters, but also what are strings themselves, which are used for your keys.
- PHP hashmap项也使用其他字段来存储信息。是的,不仅字符串/整型键很重要,字符串本身也很重要,它们是用来做键的。
- Option with string keys in your case will "win" in terms of memory amount because both options will be hashed into
ulong
(unsigned long) keys hash-map, so real difference will be in values, where string-keys option has integer (fixed-length) values, while integer-keys option has strings (chars-dependent length) values. But that may not always will be true due to possible collisions. - 在您的情况下,使用字符串键的选项将在内存数量上“胜出”,因为这两个选项都将被散列到ulong(未签名的long)键hashmap中,因此真正的差异将是在值中,其中string-keys选项具有整数(固定长度)值,而整数-key选项有字符串(与字符相关的长度)值。但由于可能发生的碰撞,这可能并不总是正确的。
- "String-numeric" keys, such as
'4'
, will be treated as integer keys and translated into integer hash result as it was integer key. Thus,'4'=>'foo'
and4 => 'foo'
are same things. - “字符串-数字”键(如'4')将被视为整数键,并转换为整数哈希结果,因为它是整数键。因此,'4'=>'foo'和4 =>'foo'都是相同的东西。
Also, important note: the graphics here are copyright of PHP internals book
另外,需要注意的是:这里的图形是PHP内部书籍的版权
Hash-map for PHP arrays
PHP arrays and C arrays
PHP数组和C数组
You should realize one very important thing: PHP is written on C, where such things as "associative array" simply does not exist. So, in C "array" is exactly what "array" is - i.e. it's just a consecutive area in memory which can be accessed by a consecutive offset. Your "keys" may be only numeric, integer and only consecutive, starting from zero. You can't have, for instance, 3
,-6
,'foo'
as your "keys" there.
您应该认识到一件非常重要的事情:PHP是用C编写的,在这里,诸如“关联数组”之类的东西根本不存在。因此,在C中“数组”就是“数组”——也就是说,它只是内存中的一个连续区域,可以被一个连续的偏移量访问。您的“键”可能只是数字、整数和连续的,从0开始。例如,你不能把3,-6,“foo”作为你的“键”。
So to implement arrays, which are in PHP, there's hash-map option, it uses hash-function to hash your keys and transform them to integers, which can be used for C-arrays. That function, however, will never be able to create a bijection between string keys and their integer hashed results. And it's easy to understand why: because cardinality of strings set is much, much larger that cardinality of integer set. Let's illustrate with example: we'll recount all strings, up to length 10, which have only alphanumeric symbols (so, 0-9
, a-z
and A-Z
, total 62): it's 6210 total strings possible. It's around 8.39E+17. Compare it with around 4E+9 which we have for unsigned integer (long integer, 32-bits) type and you'll get the idea - there will be collisions.
要实现数组,在PHP中,有hash-map选项,它使用hash-function对键进行哈希并将它们转换为整数,这可以用于c -array。然而,该函数永远无法在字符串键和它们的整数散列结果之间创建双射。很容易理解为什么:因为字符串集的基数要比整数集的基数大得多。这是约8.39 e + 17。将它与无符号整数(长整数,32位)类型的4E+9进行比较,您就会明白这是一种冲突。
PHP hash-map keys & collisions
PHP hashmap键和冲突
Now, to resolve collisions, PHP will just place items, which have same hash-function result, into one linked list. So, hash-map would not be just "list of hashed elements", but instead it will store pointers to lists of elements (each element in certain list will have same hash-function key). And this is where you have point to how it will affect memory allocation: if your array has string keys, which did not result in collisions, then no additional pointers inside those list would be needed, so memory amount will be reduced (actually, it's a very small overhead, but, since we're talking about precise memory allocation, this should be taken to account). And, same way, if your string keys will result into many collisions, then more additional pointers would be created, so total memory amount will be a bit more.
现在,为了解决冲突,PHP将把具有相同hash-function结果的项放在一个链表中。因此,hashmap不会仅仅是“散列元素列表”,而是将存储指向元素列表的指针(特定列表中的每个元素都具有相同的hash-function键)。这就是你如何影响内存分配:如果你的数组字符串键,这并没有导致碰撞,然后不需要额外的指针在这些列表,所以内存数量将减少(实际上,它是一个非常小的开销,但是,因为我们讨论的是精确的内存分配,这应采取帐户)。同样,如果您的字符串键将导致许多冲突,那么将创建更多的指针,因此总内存将会多一点。
To illustrate those relations within those lists, here's a graphic:
为了说明这些列表之间的关系,这里有一个图表:
Above there is how PHP will resolve collisions after applying hash-function. So one of your question parts lies here, pointers inside collision-resolution lists. Also, elements of linked lists are usually called buckets and the array, which contains pointers to heads of those lists is internally called arBuckets
. Due to structure optimization (so, to make such things as element deletion, faster), real list element has two pointers, previous element and next element - but that's only will make difference in memory amount for non-collision/collision arrays little wider, but won't change concept itself.
上面是PHP在应用hash-function之后如何解决冲突。你们的问题之一就在这里,碰撞分辨率列表中的指针。此外,链表的元素通常被称为bucket,数组中包含指向这些列表头部的指针在内部被称为arbucket。由于结构优化(因此,为了使元素删除更快),real list元素有两个指针,前面的元素和下一个元素,但这只会使非碰撞/碰撞数组的内存数量差异更大,但不会改变概念本身。
One more list: order
一个列表:秩序
To fully support arrays as they are in PHP, it's also needed to maintain order, so that is achieved with another internal list. Each element of arrays is a member of that list too. It won't make difference in terms of memory allocation, since in both options this list should be maintained, but for full picture, I'm mentioning this list. Here's the graphic:
要完全支持PHP中的数组,还需要保持顺序,以便使用另一个内部列表实现这一点。数组的每个元素也是这个列表的一个成员。它不会对内存分配产生影响,因为在这两个选项中都应该维护这个列表,但是对于完整的图片,我要提到这个列表。图形:
In addition to pListLast
and pListNext
, pointers to order-list head and tail are stored. Again, it's not directly related to your question, but further I'll dump internal bucket structure, where these pointers are present.
除了pListLast和pListNext之外,还存储了指向订单列表头和尾部的指针。同样,它与您的问题没有直接关系,但是我将进一步转储内部bucket结构,这些指针在其中。
Array element from inside
数组元素在
Now we're ready to look into: what is array element, so, bucket:
现在我们来看看数组元素是什么,bucket:
typedef struct bucket {
ulong h;
uint nKeyLength;
void *pData;
void *pDataPtr;
struct bucket *pListNext;
struct bucket *pListLast;
struct bucket *pNext;
struct bucket *pLast;
char *arKey;
} Bucket;
Here we are:
在这里,我们有:
-
h
is an integer (ulong) value of key, it's a result of hash-function. For integer keys it is just same as key itself (hash-function returns itself) - h是键的整数(ulong)值,是哈希函数的结果。对于整数键,它与键本身一样(hash-function返回自身)
-
pNext
/pLast
are pointers inside collision-resolution linked list - pNext / pLast是冲突分辨率链接列表中的指针
-
pListNext
/pListLast
are pointers inside order-resolution linked list - pListNext/pListLast是订单解析链接列表中的指针
-
pData
is a pointer to the stored value. Actually, value isn't same as inserted at array creation, it's copy, but, to avoid unnecessary overhead, PHP usespDataPtr
(sopData = &pDataPtr
) - pData是指向存储值的指针。实际上,值与在数组创建时插入的值不同,它是复制的,但是为了避免不必要的开销,PHP使用pDataPtr(因此pData = &pDataPtr)
From this viewpoint, you may get next thing to where difference is: since string key will be hashed (thus, h
is always ulong
and, therefore, same size), it will be a matter of what is stored in values. So for your string-keys array there will be integer values, while for integer-keys array there will be string values, and that makes difference. However - no, it isn't a magic: you can't "save memory" with storing string keys such way all the times, because if your keys would be large and there will be many of them, it will cause collisions overhead (well, with very high probability, but, of course, not guaranteed). It will "work" only for arbitrary short strings, which won't cause many collisions.
从这个角度来看,您可能会遇到不同的地方:因为string键将被散列(因此,h总是ulong,因此,同样大小),这将是一个存储在值中的问题。所以对于字符串键数组,会有整型值,而对于整数键数组,会有字符串值,这是不同的。然而——不,这并不是什么神奇的事情:你不可能一直用这样的方式存储字符串键来“保存内存”,因为如果你的键很大,而且会有很多,那么它就会引起开销冲突(当然,这种情况的发生几率非常高,但这并不能保证)。它只适用于任意的短字符串,不会引起很多的冲突。
Hash-table itself
哈希表本身
It's already been spoken about elements (buckets) and their structure, but there's also hash-table itself, which is, in fact, array data-structure. So, it's called _hashtable
:
已经讨论过元素(bucket)和它们的结构,但是还有hashtable本身,实际上是数组数据结构。所以,它叫做_hashtable:
typedef struct _hashtable {
uint nTableSize;
uint nTableMask;
uint nNumOfElements;
ulong nNextFreeElement;
Bucket *pInternalPointer; /* Used for element traversal */
Bucket *pListHead;
Bucket *pListTail;
Bucket **arBuckets;
dtor_func_t pDestructor;
zend_bool persistent;
unsigned char nApplyCount;
zend_bool bApplyProtection;
#if ZEND_DEBUG
int inconsistent;
#endif
} HashTable;
I won't describe all the fields, since I've already provided much info, which is only related to the question, but I'll describe this structure briefly:
我不会描述所有的字段,因为我已经提供了很多信息,这些信息只与问题有关,但是我将简要地描述这个结构:
-
arBuckets
is what was described above, the buckets storage, - arbucket是上面描述的,存储空间,
-
pListHead
/pListTail
are pointers to order-resolution list - pListHead/pListTail是订单解析列表的指针
-
nTableSize
determines size of hash-table. And this is directly related to memory allocation:nTableSize
is always power of 2. Thus, it's no matter if you'll have 13 or 14 elements in array: actual size will be 16. Take that to account when you want to estimate array size. - nTableSize确定哈希表的大小。这与内存分配直接相关:nTableSize永远是2的幂。因此,无论数组中是否有13或14个元素,实际大小都是16。当您想估计数组大小时,请考虑到这一点。
Conclusion
It's really difficult to predict, will one array be larger than another in your case. Yes, there are guidelines which are following from internal structure, but if string keys are comparable by their length to integer values (like 'four'
, 'one'
in your sample) - real difference will be in such things as - how many collisions occurred, how many bytes were allocated to save the value.
很难预测,一个数组会比另一个数组大吗?是的,有指导方针后,从内部结构,但如果字符串键长度可比的整数值(像“4”,“一”在你的样品)——真正的区别将在诸如有多少碰撞发生时,分配给有多少字节保存价值。
But choosing proper structure should be matter of sense, not memory. If your intention is to build the corresponding indexed data, then choice always be obvious. Post above is only about one goal: to show how arrays actually work in PHP and where you can find the difference in memory allocation in your sample.
但是选择合适的结构应该是有意义的,而不是记忆。如果您的目的是构建相应的索引数据,那么选择总是显而易见的。上面的文章只是一个目标:展示数组在PHP中的实际工作方式,以及在示例中找到内存分配差异的地方。
You may also check article about arrays & hash-tables in PHP: it's Hash-tables in PHP by PHP internals book: I've used some graphics from there. Also, to realize, how values are allocated in PHP, check zval Structure article, it may help you to understand, what will be differences between strings & integers allocation for values of your arrays. I didn't include explanations from it here, since much more important point for me - is to show array data structure and what may be difference in context of string keys/integer keys for your question.
您还可以在PHP中查看关于数组和哈希表的文章:它是PHP的哈希表,由PHP内部人编写。另外,要了解如何在PHP中分配值,请参阅zval Structure文章,它可能有助于您理解,数组的值的字符串和整数分配之间有什么区别。这里我没有解释,因为对我来说更重要的一点是——显示数组数据结构以及字符串键/整型键在上下文中有什么不同。
#2
3
Although both arrays are accessed in a different way (i.e. via string or integer value), the memory pattern is mostly similar.
虽然两个数组都以不同的方式访问(例如通过字符串或整数值),但是内存模式基本上是相似的。
This is because the string allocation either happens as part of the zval creation or when a new array key needs to be allocated; the small difference being that numeric indices don't require a whole zval structure, because they're stored as an (unsigned) long.
这是因为字符串分配要么作为zval创建的一部分发生,要么需要分配一个新的数组键;小的区别在于数字索引不需要整个zval结构,因为它们是作为(无符号的)长存储的。
The observed differences in memory allocation are so minimal that they can be largely attributed to either the inaccuracy of memory_get_usage()
or allocations due to additional bucket creation.
观察到的内存分配差异是如此之小,以至于它们很大程度上可以归因于memory_get_usage()的不准确性,或者由于额外的桶创建而导致的分配。
Conclusion
How you want to use your array must be the guiding principle in choosing how it should be indexed; memory should only become an exception to this rule when you run out of it.
如何使用数组必须是选择如何索引数组的指导原则;只有当内存用完时,内存才会成为这个规则的一个例外。
#3
3
From PHP manual Garbage Collection http://php.net/manual/en/features.gc.php
从PHP手动垃圾收集http://php.net/manual/en/features.gc.php
gc_enable(); // Enable Garbage Collector
var_dump(gc_enabled()); // true
var_dump(gc_collect_cycles()); // # of elements cleaned up
gc_disable(); // Disable Garbage Collector
PHP does not return released memory very well; Its primary usage online does not require it and effective garbage collection takes time away from providing the output; When the script ends the memory is going to be returned anyway.
PHP不能很好地返回释放的内存;它的主要在线使用不需要它,有效的垃圾收集需要花费时间来提供输出;当脚本结束时,内存将被返回。
Garbage collection happens.
垃圾收集发生。
-
When you tell it to
当你告诉它
int gc_collect_cycles ( void )
int gc_collect_cycle (void)
-
When you leave a function
当你离开一个函数
- When the script ends
- 当脚本结束时,
Better understanding of PHP's Garbage collection from a web host, (no affiliation). http://www.sitepoint.com/better-understanding-phps-garbage-collection/
更好地理解来自web主机的PHP垃圾收集(无从属关系)。http://www.sitepoint.com/better-understanding-phps-garbage-collection/
If you are considering byte by byte how the data is set in memory. Different ports are going to effect those values. 64bit CPUs performance is best when data sits on the first bit of a 64bit word. For the max performance a specific binary they would allocate the start of a block of memory on the first bit, leaving up to 7 bytes unused. This CPU specific stuff depends on what compiler was used to compile the PHP.exe. I can not offer any way to predict exact memory usage, given that it will be determined differently by different compilers.
如果你考虑的是一个字节一个字节的数据如何在内存中设置。不同的端口会影响这些值。当数据位于64位字的第一个位时,64位cpu性能最好。为了最大限度地提高二进制文件的性能,它们会在第一个比特上分配一块内存的开始,最多留下7个字节未使用。这个特定于CPU的东西取决于用来编译PHP.exe的编译器。我不能提供任何方法来预测准确的内存使用情况,因为不同的编译器会以不同的方式确定它。
Alma Do, post goes to the specifics of the source which is sent to the compiler. What the PHP source requests and the compiler optimizes.
Alma Do, post指向发送给编译器的源的细节。PHP源请求和编译器优化的内容。
Looking at the specific examples you posted. When the key is a ascii letter they are taking 4 bytes (64 bits) more per entry ... this suggests to me, (assuming no garbage or memory holes, ect), that the ascii keys are greater than 64 bits, but the numeric keys are fit in a 64bit word. It suggests to me your using a 64bit computer and your PHP.exe is compiled for 64bit CPUs.
看看你发布的具体例子。当密钥是一个ascii码字母时,每个条目多占用4字节(64位)……这让我想到,(假设没有垃圾或内存漏洞,等等),ascii键大于64位,但是数字键适合64位字。我建议你使用64位电脑和PHP。exe编译为64位cpu。
#4
1
Arrays in PHP are implemented as hashmaps. Hence the length of the value you use for the key has little impact on the data requirement. In older versions of PHP there was a significant performance degradation with large arrays as the hash size was fixed at array creation - when collisions starting occurring then increasing numbers of hash values would map to linked lists of values which then had to be further searched (with an O(n) algorithm) instead of a single value, but more recently the hash appears to either use a much larger default size or is resized dynamically (it just works - I can't really be bothered reading the source code).
PHP中的数组实现为hashmap。因此,为键使用的值的长度对数据需求的影响很小。在旧版本的PHP有显著的性能退化与大型数组作为散列的大小是固定在创建数组,当碰撞开始出现越来越多的散列值会映射到链表的值然后必须进一步搜索(O(n)的算法),而不是一个单一的值,但最近,哈希似乎要么使用更大的默认值,要么动态调整大小(它只会工作——我实在懒得读源代码)。
Saving 4 bytes from your scripts is not going to cause Google any sleepless nights. If you are writing code which uses large arrays (where the savings may be more significant) you're probably doing it wrong - the time and resource taken to fill up the array could be better spent elsewhere (like indexed storage).
从脚本中节省4个字节不会导致谷歌任何不眠之夜。如果您正在编写使用大数组的代码(节省的空间可能更大),那么您可能做错了——填充数组所花费的时间和资源最好花在其他地方(比如索引存储)。