I have a string function that accepts a pointer to a source string and returns a pointer to a destination string. This function currently works, but I'm worried I'm not following the best practice regrading malloc, realloc, and free.
我有一个字符串函数,它接受指向源字符串的指针并返回指向目标字符串的指针。这个功能目前有效,但我担心我没有遵循重新编写malloc,realloc和free的最佳实践。
The thing that's different about my function is that the length of the destination string is not the same as the source string, so realloc() has to be called inside my function. I know from looking at the docs...
与我的函数不同的是,目标字符串的长度与源字符串不同,因此必须在我的函数内调用realloc()。我从查看文档中了解到......
http://www.cplusplus.com/reference/cstdlib/realloc/
that the memory address might change after the realloc. This means I have can't "pass by reference" like a C programmer might for other functions, I have to return the new pointer.
realloc后内存地址可能会改变。这意味着我不能像C程序员那样“通过引用传递”其他函数,我必须返回新的指针。
So the prototype for my function is:
所以我的功能原型是:
//decode a uri encoded string
char *net_uri_to_text(char *);
I don't like the way I'm doing it because I have to free the pointer after running the function:
我不喜欢我这样做的方式,因为我必须在运行函数后释放指针:
char * chr_output = net_uri_to_text("testing123%5a%5b%5cabc");
printf("%s\n", chr_output); //testing123Z[\abc
free(chr_output);
Which means that malloc() and realloc() are called inside my function and free() is called outside my function.
这意味着在我的函数内部调用malloc()和realloc(),在函数外部调用free()。
I have a background in high level languages, (perl, plpgsql, bash) so my instinct is proper encapsulation of such things, but that might not be the best practice in C.
我有高级语言的背景,(perl,plpgsql,bash)所以我的本能是对这些东西的正确封装,但这可能不是C中的最佳实践。
The question: Is my way best practice, or is there a better way I should follow?
问题:我的方式是最佳实践,还是我应该采用更好的方法?
full example
Compiles and runs with two warnings on unused argc and argv arguments, you can safely ignore those two warnings.
在未使用的argc和argv参数上编译并运行两个警告,您可以安全地忽略这两个警告。
example.c:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
char *net_uri_to_text(char *);
int main(int argc, char ** argv) {
char * chr_input = "testing123%5a%5b%5cabc";
char * chr_output = net_uri_to_text(chr_input);
printf("%s\n", chr_output);
free(chr_output);
return 0;
}
//decodes uri-encoded string
//send pointer to source string
//return pointer to destination string
//WARNING!! YOU MUST USE free(chr_result) AFTER YOU'RE DONE WITH IT OR YOU WILL GET A MEMORY LEAK!
char *net_uri_to_text(char * chr_input) {
//define variables
int int_length = strlen(chr_input);
int int_new_length = int_length;
char * chr_output = malloc(int_length);
char * chr_output_working = chr_output;
char * chr_input_working = chr_input;
int int_output_working = 0;
unsigned int uint_hex_working;
//while not a null byte
while(*chr_input_working != '\0') {
//if %
if (*chr_input_working == *"%") {
//then put correct char in
sscanf(chr_input_working + 1, "%02x", &uint_hex_working);
*chr_output_working = (char)uint_hex_working;
//printf("special char:%c, %c, %d<\n", *chr_output_working, (char)uint_hex_working, uint_hex_working);
//realloc
chr_input_working++;
chr_input_working++;
int_new_length -= 2;
chr_output = realloc(chr_output, int_new_length);
//output working must be the new pointer plys how many chars we've done
chr_output_working = chr_output + int_output_working;
} else {
//put char in
*chr_output_working = *chr_input_working;
}
//increment pointers and number of chars in output working
chr_input_working++;
chr_output_working++;
int_output_working++;
}
//last null byte
*chr_output_working = '\0';
return chr_output;
}
6 个解决方案
#1
8
It's perfectly ok to return malloc
'd buffers from functions in C, as long as you document the fact that they do. Lots of libraries do that, even though no function in the standard library does.
只要您记录它们的事实,从C中的函数返回malloc缓冲区是完全可以的。很多库都这样做,即使标准库中没有任何功能。
If you can compute (a not too pessimistic upper bound on) the number of characters that need to be written to the buffer cheaply, you can offer a function that does that and let the user call it.
如果你可以计算(一个不太悲观的上限)需要廉价地写入缓冲区的字符数,你可以提供一个功能,让用户调用它。
It's also possible, but much less convenient, to accept a buffer to be filled in; I've seen quite a few libraries that do that like so:
接受要填充的缓冲区也是可能的,但不太方便;我见过很多像这样的库:
/*
* Decodes uri-encoded string encoded into buf of length len (including NUL).
* Returns the number of characters written. If that number is less than len,
* nothing is written and you should try again with a larger buffer.
*/
size_t net_uri_to_text(char const *encoded, char *buf, size_t len)
{
size_t space_needed = 0;
while (decoding_needs_to_be_done()) {
// decode characters, but only write them to buf
// if it wouldn't overflow;
// increment space_needed regardless
}
return space_needed;
}
Now the caller is responsible for the allocation, and would do something like
现在调用者负责分配,并会做类似的事情
size_t len = SOME_VALUE_THAT_IS_USUALLY_LONG_ENOUGH;
char *result = xmalloc(len);
len = net_uri_to_text(input, result, len);
if (len > SOME_VALUE_THAT_IS_USUALLY_LONG_ENOUGH) {
// try again
result = xrealloc(input, result, len);
}
(Here, xmalloc
and xrealloc
are "safe" allocating functions that I made up to skip NULL checks.)
(这里,xmalloc和xrealloc是“安全”分配函数,我用它来跳过NULL检查。)
#2
2
The thing is that C is low-level enough to force the programmer to get her memory management right. In particular, there's nothing wrong with returning a malloc()
ated string. It's a common idiom to return mallocated obejcts and have the caller free()
them.
问题在于C是低级别的,足以迫使程序员正确地进行内存管理。特别是,返回malloc()字符串并没有错。返回mallocated obejcts并让调用者free()它们是一种常见的习惯用法。
And anyways, if you don't like this approach, you can always take a pointer to the string and modify it from inside the function (after the last use, it will still need to be free()
d, though).
无论如何,如果你不喜欢这种方法,你总是可以指向字符串并从函数内部修改它(在最后一次使用之后,它仍然需要是free()d)。
One thing, however, that I don't think is necessary is explicitly shrinking the string. If the new string is shorter than the old one, there's obviously enough room for it in the memory chunk of the old string, so you don't need to realloc()
.
然而,我认为没有必要的一件事是明确缩小字符串。如果新字符串比旧字符串短,那么在旧字符串的内存块中显然有足够的空间,因此您不需要realloc()。
(Apart from the fact that you forgot to allocate one extra byte for the terminating NUL character, of course...)
(除了您忘记为终止NUL字符分配一个额外字节的事实,当然......)
And, as always, you can just return a different pointer each time the function is called, and you don't even need to call realloc()
at all.
并且,与往常一样,每次调用函数时都可以返回不同的指针,甚至根本不需要调用realloc()。
If you accept one last piece of good advice: it's advisable to const
-qualify your input strings, so the caller can ensure that you don't modify them. Using this approach, you can safely call the function on string literals, for example.
如果您接受最后一条好的建议:建议对输入字符串进行const限定,这样调用者就可以确保不修改它们。例如,使用这种方法,您可以安全地在字符串文字上调用该函数。
All in all, I'd rewrite your function like this:
总而言之,我会像这样重写你的功能:
char *unescape(const char *s)
{
size_t l = strlen(s);
char *p = malloc(l + 1), *r = p;
while (*s) {
if (*s == '%') {
char buf[3] = { s[1], s[2], 0 };
*p++ = strtol(buf, NULL, 16); // yes, I prefer this over scanf()
s += 3;
} else {
*p++ = *s++;
}
}
*p = 0;
return r;
}
And call it as follows:
并将其称为如下:
int main()
{
const char *in = "testing123%5a%5b%5cabc";
char *out = unescape(in);
printf("%s\n", out);
free(out);
return 0;
}
#3
2
It's perfectly OK to return newly-malloc
-ed (and possibly internally realloc
ed) values from functions, you just need to document that you are doing so (as you do here).
从函数中返回新的malloc-ed(以及可能内部重新分配的)值是完全可以的,你只需要记录你正在这样做(就像你在这里一样)。
Other obvious items:
其他明显的项目:
- Instead of
int int_length
you might want to usesize_t
. This is "an unsigned type" (usuallyunsigned int
orunsigned long
) that is the appropriate type for lengths of strings and arguments tomalloc
. - You need to allocate n+1 bytes initially, where n is the length of the string, as
strlen
does not include the terminating 0 byte. - You should check for
malloc
failing (returningNULL
). If your function will pass the failure on, document that in the function-description comment. -
sscanf
is pretty heavy-weight for converting the two hex bytes. Not wrong, except that you're not checking whether the conversion succeeds (what if the input is malformed? you can of course decide that this is the caller's problem but in general you might want to handle that). You can useisxdigit
from<ctype.h>
to check for hexadecimal digits, and/orstrtoul
to do the conversion. - Rather than doing one
realloc
for every%
conversion, you might want to do a final "shrink realloc" if desirable. Note that if you allocate (say) 50 bytes for a string and find it requires only 49 including the final 0 byte, it may not be worth doing arealloc
after all.
您可能希望使用size_t而不是int int_length。这是“无符号类型”(通常是unsigned int或unsigned long),它是字符串长度和malloc参数的适当类型。
您最初需要分配n + 1个字节,其中n是字符串的长度,因为strlen不包括终止0字节。
你应该检查malloc是否失败(返回NULL)。如果您的函数将失败,请在函数描述注释中记录。
sscanf非常重,用于转换两个十六进制字节。没错,除了你没有检查转换是否成功(如果输入格式错误怎么办?你当然可以决定这是调用者的问题,但一般来说你可能想要处理它)。您可以使用
如果需要,您可能希望进行最终的“收缩重新分配”,而不是为每个%转换执行一次realloc。请注意,如果为字符串分配(比方说)50个字节,并且发现它只需要包含最后0个字节的49个字节,那么毕竟可能不值得进行重新分配。
#4
0
I would approach the problem in a slightly different way. Personally, I would split your function in two. The first function to calculate the size you need to malloc. The second would write the output string to the given pointer (which has been allocated outside of the function). That saves several calls to realloc, and will keep the complexity the same. A possible function to find the size of the new string is:
我会以稍微不同的方式解决问题。就个人而言,我会将你的功能分成两部分。第一个计算malloc所需大小的函数。第二个将输出字符串写入给定指针(已在函数外部分配)。这节省了几次调用realloc,并将保持复杂性相同。查找新字符串大小的可能函数是:
int getNewSize (char *string) {
char *i = string;
int size = 0, percent = 0;
for (i, size; *i != '\0'; i++, size++) {
if (*i == '%')
percent++;
}
return size - percent * 2;
}
However, as mentioned in other answers there is no problem in returning a malloc'ed buffer as long as you document it!
但是,正如其他答案中所提到的,只要您记录它就没有问题返回malloc'ed缓冲区!
#5
0
Additionally what was already mentioned in the other postings, you should also document the fact that the string is reallocated. If your code is called with a static string or a string allocated with alloca
, you may not reallocate it.
此外,在其他帖子中已经提到过,您还应该记录字符串被重新分配的事实。如果使用静态字符串或使用alloca分配的字符串调用代码,则可能无法重新分配它。
#6
0
I think you are right to be concerned about splitting up mallocs and frees. As a rule, whatever makes it, owns it and should free it.
我认为你关注拆分mallocs和frees是正确的。作为一项规则,无论是什么使它拥有它,并且应该释放它。
In this case, where the strings are relatively small, one good procedure is to make the string buffer larger than any possible string it could contain. For example, URLs have a de facto limit of about 2000 characters, so if you malloc 10000 characters you can store any possible URL.
在这种情况下,字符串相对较小,一个好的过程是使字符串缓冲区大于它可能包含的任何可能的字符串。例如,URL实际上限制为大约2000个字符,因此如果您使用malloc 10000个字符,则可以存储任何可能的URL。
Another trick is to store both the length and capacity of the string at its front, so that (int)*mystring == length of string
and (int)*(mystring + 4) == capacity
of string. Thus, the string itself only starts at the 8th position *(mystring+8)
. By doing this you can pass around a single pointer to a string and always know how long it is and how much memory capacity the string has. You can make macros that automatically generate these offsets and make "pretty code".
另一个技巧是在其前面存储字符串的长度和容量,以便(int)* mystring ==字符串的长度和(int)*(mystring + 4)==字符串的容量。因此,字符串本身仅从第8个位置开始*(mystring + 8)。通过执行此操作,您可以传递一个指向字符串的指针,并始终知道它有多长以及字符串具有多少内存容量。您可以制作自动生成这些偏移的宏并制作“漂亮的代码”。
The value of using buffers this way is you do not need to do a reallocation. The new value overwrites the old value and you update the length at the beginning of the string.
以这种方式使用缓冲区的价值是您不需要重新分配。新值将覆盖旧值,并更新字符串开头的长度。
#1
8
It's perfectly ok to return malloc
'd buffers from functions in C, as long as you document the fact that they do. Lots of libraries do that, even though no function in the standard library does.
只要您记录它们的事实,从C中的函数返回malloc缓冲区是完全可以的。很多库都这样做,即使标准库中没有任何功能。
If you can compute (a not too pessimistic upper bound on) the number of characters that need to be written to the buffer cheaply, you can offer a function that does that and let the user call it.
如果你可以计算(一个不太悲观的上限)需要廉价地写入缓冲区的字符数,你可以提供一个功能,让用户调用它。
It's also possible, but much less convenient, to accept a buffer to be filled in; I've seen quite a few libraries that do that like so:
接受要填充的缓冲区也是可能的,但不太方便;我见过很多像这样的库:
/*
* Decodes uri-encoded string encoded into buf of length len (including NUL).
* Returns the number of characters written. If that number is less than len,
* nothing is written and you should try again with a larger buffer.
*/
size_t net_uri_to_text(char const *encoded, char *buf, size_t len)
{
size_t space_needed = 0;
while (decoding_needs_to_be_done()) {
// decode characters, but only write them to buf
// if it wouldn't overflow;
// increment space_needed regardless
}
return space_needed;
}
Now the caller is responsible for the allocation, and would do something like
现在调用者负责分配,并会做类似的事情
size_t len = SOME_VALUE_THAT_IS_USUALLY_LONG_ENOUGH;
char *result = xmalloc(len);
len = net_uri_to_text(input, result, len);
if (len > SOME_VALUE_THAT_IS_USUALLY_LONG_ENOUGH) {
// try again
result = xrealloc(input, result, len);
}
(Here, xmalloc
and xrealloc
are "safe" allocating functions that I made up to skip NULL checks.)
(这里,xmalloc和xrealloc是“安全”分配函数,我用它来跳过NULL检查。)
#2
2
The thing is that C is low-level enough to force the programmer to get her memory management right. In particular, there's nothing wrong with returning a malloc()
ated string. It's a common idiom to return mallocated obejcts and have the caller free()
them.
问题在于C是低级别的,足以迫使程序员正确地进行内存管理。特别是,返回malloc()字符串并没有错。返回mallocated obejcts并让调用者free()它们是一种常见的习惯用法。
And anyways, if you don't like this approach, you can always take a pointer to the string and modify it from inside the function (after the last use, it will still need to be free()
d, though).
无论如何,如果你不喜欢这种方法,你总是可以指向字符串并从函数内部修改它(在最后一次使用之后,它仍然需要是free()d)。
One thing, however, that I don't think is necessary is explicitly shrinking the string. If the new string is shorter than the old one, there's obviously enough room for it in the memory chunk of the old string, so you don't need to realloc()
.
然而,我认为没有必要的一件事是明确缩小字符串。如果新字符串比旧字符串短,那么在旧字符串的内存块中显然有足够的空间,因此您不需要realloc()。
(Apart from the fact that you forgot to allocate one extra byte for the terminating NUL character, of course...)
(除了您忘记为终止NUL字符分配一个额外字节的事实,当然......)
And, as always, you can just return a different pointer each time the function is called, and you don't even need to call realloc()
at all.
并且,与往常一样,每次调用函数时都可以返回不同的指针,甚至根本不需要调用realloc()。
If you accept one last piece of good advice: it's advisable to const
-qualify your input strings, so the caller can ensure that you don't modify them. Using this approach, you can safely call the function on string literals, for example.
如果您接受最后一条好的建议:建议对输入字符串进行const限定,这样调用者就可以确保不修改它们。例如,使用这种方法,您可以安全地在字符串文字上调用该函数。
All in all, I'd rewrite your function like this:
总而言之,我会像这样重写你的功能:
char *unescape(const char *s)
{
size_t l = strlen(s);
char *p = malloc(l + 1), *r = p;
while (*s) {
if (*s == '%') {
char buf[3] = { s[1], s[2], 0 };
*p++ = strtol(buf, NULL, 16); // yes, I prefer this over scanf()
s += 3;
} else {
*p++ = *s++;
}
}
*p = 0;
return r;
}
And call it as follows:
并将其称为如下:
int main()
{
const char *in = "testing123%5a%5b%5cabc";
char *out = unescape(in);
printf("%s\n", out);
free(out);
return 0;
}
#3
2
It's perfectly OK to return newly-malloc
-ed (and possibly internally realloc
ed) values from functions, you just need to document that you are doing so (as you do here).
从函数中返回新的malloc-ed(以及可能内部重新分配的)值是完全可以的,你只需要记录你正在这样做(就像你在这里一样)。
Other obvious items:
其他明显的项目:
- Instead of
int int_length
you might want to usesize_t
. This is "an unsigned type" (usuallyunsigned int
orunsigned long
) that is the appropriate type for lengths of strings and arguments tomalloc
. - You need to allocate n+1 bytes initially, where n is the length of the string, as
strlen
does not include the terminating 0 byte. - You should check for
malloc
failing (returningNULL
). If your function will pass the failure on, document that in the function-description comment. -
sscanf
is pretty heavy-weight for converting the two hex bytes. Not wrong, except that you're not checking whether the conversion succeeds (what if the input is malformed? you can of course decide that this is the caller's problem but in general you might want to handle that). You can useisxdigit
from<ctype.h>
to check for hexadecimal digits, and/orstrtoul
to do the conversion. - Rather than doing one
realloc
for every%
conversion, you might want to do a final "shrink realloc" if desirable. Note that if you allocate (say) 50 bytes for a string and find it requires only 49 including the final 0 byte, it may not be worth doing arealloc
after all.
您可能希望使用size_t而不是int int_length。这是“无符号类型”(通常是unsigned int或unsigned long),它是字符串长度和malloc参数的适当类型。
您最初需要分配n + 1个字节,其中n是字符串的长度,因为strlen不包括终止0字节。
你应该检查malloc是否失败(返回NULL)。如果您的函数将失败,请在函数描述注释中记录。
sscanf非常重,用于转换两个十六进制字节。没错,除了你没有检查转换是否成功(如果输入格式错误怎么办?你当然可以决定这是调用者的问题,但一般来说你可能想要处理它)。您可以使用
如果需要,您可能希望进行最终的“收缩重新分配”,而不是为每个%转换执行一次realloc。请注意,如果为字符串分配(比方说)50个字节,并且发现它只需要包含最后0个字节的49个字节,那么毕竟可能不值得进行重新分配。
#4
0
I would approach the problem in a slightly different way. Personally, I would split your function in two. The first function to calculate the size you need to malloc. The second would write the output string to the given pointer (which has been allocated outside of the function). That saves several calls to realloc, and will keep the complexity the same. A possible function to find the size of the new string is:
我会以稍微不同的方式解决问题。就个人而言,我会将你的功能分成两部分。第一个计算malloc所需大小的函数。第二个将输出字符串写入给定指针(已在函数外部分配)。这节省了几次调用realloc,并将保持复杂性相同。查找新字符串大小的可能函数是:
int getNewSize (char *string) {
char *i = string;
int size = 0, percent = 0;
for (i, size; *i != '\0'; i++, size++) {
if (*i == '%')
percent++;
}
return size - percent * 2;
}
However, as mentioned in other answers there is no problem in returning a malloc'ed buffer as long as you document it!
但是,正如其他答案中所提到的,只要您记录它就没有问题返回malloc'ed缓冲区!
#5
0
Additionally what was already mentioned in the other postings, you should also document the fact that the string is reallocated. If your code is called with a static string or a string allocated with alloca
, you may not reallocate it.
此外,在其他帖子中已经提到过,您还应该记录字符串被重新分配的事实。如果使用静态字符串或使用alloca分配的字符串调用代码,则可能无法重新分配它。
#6
0
I think you are right to be concerned about splitting up mallocs and frees. As a rule, whatever makes it, owns it and should free it.
我认为你关注拆分mallocs和frees是正确的。作为一项规则,无论是什么使它拥有它,并且应该释放它。
In this case, where the strings are relatively small, one good procedure is to make the string buffer larger than any possible string it could contain. For example, URLs have a de facto limit of about 2000 characters, so if you malloc 10000 characters you can store any possible URL.
在这种情况下,字符串相对较小,一个好的过程是使字符串缓冲区大于它可能包含的任何可能的字符串。例如,URL实际上限制为大约2000个字符,因此如果您使用malloc 10000个字符,则可以存储任何可能的URL。
Another trick is to store both the length and capacity of the string at its front, so that (int)*mystring == length of string
and (int)*(mystring + 4) == capacity
of string. Thus, the string itself only starts at the 8th position *(mystring+8)
. By doing this you can pass around a single pointer to a string and always know how long it is and how much memory capacity the string has. You can make macros that automatically generate these offsets and make "pretty code".
另一个技巧是在其前面存储字符串的长度和容量,以便(int)* mystring ==字符串的长度和(int)*(mystring + 4)==字符串的容量。因此,字符串本身仅从第8个位置开始*(mystring + 8)。通过执行此操作,您可以传递一个指向字符串的指针,并始终知道它有多长以及字符串具有多少内存容量。您可以制作自动生成这些偏移的宏并制作“漂亮的代码”。
The value of using buffers this way is you do not need to do a reallocation. The new value overwrites the old value and you update the length at the beginning of the string.
以这种方式使用缓冲区的价值是您不需要重新分配。新值将覆盖旧值,并更新字符串开头的长度。