I have a csv with over 4M lines that I'm loading into an array.
我有一个超过4M行的csv,我正在加载到一个数组中。
csv: EURUSD,20010102,230100,0.9507,0.9507,0.9507,0.9507,4
This operation takes about 3.5 minutes.
此操作大约需要3.5分钟。
...
typedef struct Rates_t
{
char open[7];
char high[7];
char low[7];
char close[7];
} Rates_t;
void Substr(char *src, char **dst, int start, int length)
{
char *ptr1 = *dst;
char *ptr2 = src+start;
int i;
for (i = 0; i < length; i++)
{
*(ptr1 + i) = *(ptr2 + i);
}
(*dst)[length] = '\0';
}
void FillRates(char *tmp, char *price)
{
Substr(tmp, &price, 0, 6);
}
bool BacktestServer()
{
...
Rates_t r = { {0}, {0}, {0}, {0} };
Rates_t *rates = &r;
rates = (Rates_t *) malloc(sizeof(Rates_t));
FILE *f;
if (!(f = fopen("EURUSD.txt", "r"))) {
fprintf(stderr, "Unable to open 'EURUSD.txt' for reading.\n");
exit(1);
}
...
while (fgets(line, 72, f))
{
tmp = line;
for (skip = 0; skip < 3; skip++)
{
tmp = strchr(tmp, ',');
tmp++;
}
sz += sizeof(Rates_t);
rates = (Rates_t *) realloc(rates, sz);
FillRates(tmp, rates[i].open);
tmp = strchr(tmp, ',');
tmp++;
FillRates(tmp, rates[i].high);
tmp = strchr(tmp, ',');
tmp++;
FillRates(tmp, rates[i].low);
tmp = strchr(tmp, ',');
tmp++;
FillRates(tmp, rates[i].close);
i++;
free(line);
line = NULL;
line = (char *) malloc(72 * sizeof(char));
}
...
}
This takes about 1 minute.
这大约需要1分钟。
...
typedef struct Rates_t
{
char *open;
char *high;
char *low;
char *close;
} Rates_t;
void Substr(char *src, char **dst, int start, int length)
{
char *ptr1 = *dst;
char *ptr2 = src+start;
int i;
for (i = 0; i < length; i++)
{
*(ptr1 + i) = *(ptr2 + i);
}
(*dst)[length] = '\0';
}
void FillRates(char *tmp, char *price)
{
Substr(tmp, &price, 0, 6);
}
bool BacktestServer()
{
...
Rates_t r = { NULL, NULL, NULL, NULL };
Rates_t *rates = &r;
rates = (Rates_t *) malloc(sizeof(Rates_t));
FILE *f;
if (!(f = fopen("EURUSD.txt", "r"))) {
fprintf(stderr, "Unable to open 'EURUSD.txt' for reading.\n");
exit(1);
}
...
while (fgets(line, 72, f))
{
tmp = line;
for (skip = 0; skip < 3; skip++)
{
tmp = strchr(tmp, ',');
tmp++;
}
sz += sizeof(Rates_t);
rates = (Rates_t *) realloc(rates, sz);
rates[i].open = (char *) malloc(7 * sizeof(char));
FillRates(tmp, rates[i].open);
tmp = strchr(tmp, ',');
tmp++;
rates[i].high = (char *) malloc(7 * sizeof(char));
FillRates(tmp, rates[i].high);
tmp = strchr(tmp, ',');
tmp++;
rates[i].low = (char *) malloc(7 * sizeof(char));
FillRates(tmp, rates[i].low);
tmp = strchr(tmp, ',');
tmp++;
rates[i].close = (char *) malloc(7 * sizeof(char));
FillRates(tmp, rates[i].close);
i++;
free(line);
line = NULL;
line = (char *) malloc(72 * sizeof(char));
}
...
}
Using either memcpy or snprintf, the program will be a few seconds longer.
使用memcpy或snprintf,程序将持续几秒钟。
void Substr(char *src, char **dst, int start, int length)
{
memcpy(*dst, src+start, length);
(*dst)[length] = '\0';
}
void Substr(char *src, char **dst, int start, int length)
{
snprintf(*dst, length + 1, "%s", src+start);
(*dst)[length] = '\0';
}
From the consensus online, the static array should be faster than the dynamic array. If anyone needs more information I'll edit the post to that effect.
从在线共识来看,静态数组应该比动态数组更快。如果有人需要更多信息,我会编辑相应的帖子。
UPDATE:
I increased the allocation to not 2 as suggested but 4096 and I'm still getting the same results for the dynamic array version, about a minute or less. The static array version has decreased to about 2.75 minutes.
我按照建议将分配增加到不是2但是4096我仍然得到动态阵列版本相同的结果,大约一分钟或更短。静态阵列版本减少到大约2.75分钟。
The initial allocation:
初始分配:
int sz = 256 * sizeof(Rates_t);
rates = (Rates_t *) malloc(sz);
The reallocation:
if (realloc_count == 256)
{
sz += 256 * sizeof(Rates_t);
rates = (Rates_t *) realloc(rates, sz);
realloc_count = 0;
}
realloc_count++;
I am on a 64-bit Windows machine but I compile 32-bit programs via cygwin gcc. On the other hand, on 64-bit Linux in a VM, the speeds are obviously significantly less, but the speeds are reversed. The dynamically allocated version takes longer than the static version. On Linux, dynamic memory = ~20-30 seconds, static = ~15 seconds. On Linux @1, 2, 256, 4096 or 524,288 there was little to no change in speed. When I increased the allocation to 524,288 on cygwin, I get ~6 seconds for static allocation and ~8 seconds for dynamic allocation.
我在64位Windows机器上,但我通过cygwin gcc编译32位程序。另一方面,在VM中的64位Linux上,速度显然要小得多,但速度却相反。动态分配的版本比静态版本花费更长的时间。在Linux上,动态内存= ~20-30秒,静态= ~15秒。在Linux @ 1,2,256,4096或524,288上,速度几乎没有变化。当我在cygwin上将分配增加到524,288时,静态分配约为6秒,动态分配约为8秒。
2 个解决方案
#1
I'm surprised it would make this much of a difference, but since you realloc()
the rates
array for each line of data read, it's likely that the cost of copying that array so often is the culprit. If the target is a 32-bit machine, the Rates_t
structure that contains the full arrays is probably about twice the size of a Rates_t
structure that contains only pointers.
我很惊讶它会产生很大的不同,但是由于你realloc()读取了每行数据的速率数组,因此复制该数组的成本很可能是罪魁祸首。如果目标是32位机器,则包含完整数组的Rates_t结构可能大约是仅包含指针的Rates_t结构大小的两倍。
As JS1 mentioned in a comment, sizing the array appropriately up front (or reallocating in larger chunks if you don't know how big it needs to be) should make the run time difference disappear.
正如JS1在评论中提到的那样,预先适当地调整数组的大小(或者如果你不知道它需要多大就重新分配更大的块)应该使运行时差消失。
#2
One difference is the amount of data copied during a realloc. In the first case, the code realloc's an array of structs of size 28, not on a ram / cache friendly boundary. In the second case, the code realloc's an array of structs of size 16 which is on a ram / cache friendly boundary, but then it does 4 mallocs (these don't get realloc'ed).
一个区别是重新分配期间复制的数据量。在第一种情况下,代码realloc是一个大小为28的结构数组,而不是在ram / cache友好边界上。在第二种情况下,代码realloc是一个大小为16的结构数组,它位于ram / cache友好边界上,但随后它会执行4个malloc(这些不会被重新分配)。
I'm wondering if you changed the character array sizes in the first case from 7 to 8 would help.
我想知道你是否将第一种情况下的字符数组大小从7改为8会有所帮助。
I'm also wondering if this was done in 64 bit mode (64 bit pointers), if the difference would be similar.
我也想知道这是否是在64位模式(64位指针)中完成的,如果差异相似的话。
#1
I'm surprised it would make this much of a difference, but since you realloc()
the rates
array for each line of data read, it's likely that the cost of copying that array so often is the culprit. If the target is a 32-bit machine, the Rates_t
structure that contains the full arrays is probably about twice the size of a Rates_t
structure that contains only pointers.
我很惊讶它会产生很大的不同,但是由于你realloc()读取了每行数据的速率数组,因此复制该数组的成本很可能是罪魁祸首。如果目标是32位机器,则包含完整数组的Rates_t结构可能大约是仅包含指针的Rates_t结构大小的两倍。
As JS1 mentioned in a comment, sizing the array appropriately up front (or reallocating in larger chunks if you don't know how big it needs to be) should make the run time difference disappear.
正如JS1在评论中提到的那样,预先适当地调整数组的大小(或者如果你不知道它需要多大就重新分配更大的块)应该使运行时差消失。
#2
One difference is the amount of data copied during a realloc. In the first case, the code realloc's an array of structs of size 28, not on a ram / cache friendly boundary. In the second case, the code realloc's an array of structs of size 16 which is on a ram / cache friendly boundary, but then it does 4 mallocs (these don't get realloc'ed).
一个区别是重新分配期间复制的数据量。在第一种情况下,代码realloc是一个大小为28的结构数组,而不是在ram / cache友好边界上。在第二种情况下,代码realloc是一个大小为16的结构数组,它位于ram / cache友好边界上,但随后它会执行4个malloc(这些不会被重新分配)。
I'm wondering if you changed the character array sizes in the first case from 7 to 8 would help.
我想知道你是否将第一种情况下的字符数组大小从7改为8会有所帮助。
I'm also wondering if this was done in 64 bit mode (64 bit pointers), if the difference would be similar.
我也想知道这是否是在64位模式(64位指针)中完成的,如果差异相似的话。