I have wondered for some time, what a nice, clean solution for joining an array of strings might look like. Example: I have ["Alpha", "Beta", "Gamma"] and want to join the strings into one, separated by commas – "Alpha, Beta, Gamma".
一段时间以来,我一直在想,对于连接字符串数组来说,一个多么漂亮、干净的解决方案是什么样的。例如:我有["Alpha", "Beta", "Gamma",我想把字符串连接成一个,用逗号隔开——"Alpha, Beta, Gamma"。
Now I know that most programming languages offer some kind of join method for this. I just wonder how these might be implemented. When I took introductory courses, I often tried to go it alone, but never found a satisfactory algorithm. Everything seemed rather messy, the problem being that you can not just loop through the array, concatenating the strings, as you would add one too many commas (either before or after the last string). I don’t want to check conditions in the loop. I don’t really want to add the first or the last string before/after the loop (I guess this is maybe the best way?).
现在我知道大多数编程语言都提供了某种连接方法。我只是想知道如何实现这些。当我学习入门课程的时候,我经常尝试一个人去学习,但是从来没有找到一个令人满意的算法。一切看起来都很混乱,问题是你不能仅仅在数组中循环,连接字符串,因为你会添加太多的逗号(在最后一个字符串之前或之后)。我不想检查循环中的条件。我并不想在循环前后添加第一个或最后一个字符串(我想这可能是最好的方法?)
Can someone show me an elegant solution? Or tell me exactly why there can’t be anything more elegant?
谁能给我一个优雅的解决方案吗?或者告诉我为什么没有比这更优雅的了?
16 个解决方案
#1
18
The most elegant solution i found for problems like this is something like this (in pseudocode)
对于这样的问题,我找到的最优雅的解决方案是这样的(在伪代码中)
separator = ""
foreach(item in stringCollection)
{
concatenatedString += separator + item
separator = ","
}
You just run the loop and only after the second time around the separator is set. So the first time it won't get added. It's not as clean as I'd like it to be so I'd still add comments but it's better than an if statement or adding the first or last item outside the loop.
你只需要运行这个循环,并且只在第二次围绕分隔符的时候设置。所以第一次不会被添加。它不像我希望的那样干净,所以我仍然要添加注释,但它比if语句或在循环之外添加第一个或最后一个项要好。
#2
10
All of these solutions are decent ones, but for an underlying library, both independence of separator and decent speed are important. Here is a function that fits the requirement assuming the language has some form of string builder.
所有这些解决方案都很不错,但是对于底层库来说,独立的分隔符和良好的速度都很重要。这里有一个函数,假设语言有某种形式的字符串构建器,它就能满足需求。
public static string join(String[] strings, String sep) {
if(strings.length == 0) return "";
if(strings.length == 1) return strings[0];
StringBuilder sb = new StringBuilder();
sb.append(strings[0]);
for(int i = 1; i < strings.length; i++) {
sb.append(sep);
sb.append(strings[i]);
}
return sb.toString();
}
EDIT: I suppose I should mention why this would be speedier. The main reason would be because any time you call c = a + b; the underlying construct is usually c = (new StringBuilder()).append(a).append(b).toString();. By reusing the same string builder object, we can reduce the amount of allocations and garbage we produce.
编辑:我想我应该提一下为什么这样会更快。主要原因是任何时候你调用c = a + b;底层的构造通常是c = (new StringBuilder()).append(a).append(b). tostring ();通过重用相同的字符串构建器对象,我们可以减少我们产生的分配和垃圾的数量。
And before someone chimes in with optimization is evil, we're talking about implementing a common library function. Acceptable, scalable performance is one of the requirements them. A join that takes a long time is one that's going to be not oft used.
在有人插话进行优化之前,我们讨论的是实现一个公共库函数。可接受的、可伸缩的性能是它们的要求之一。一个需要很长时间的连接是不经常使用的。
#3
5
Most languages nowadays - e.g. perl (mention by Jon Ericson), php, javascript - have a join() function or method, and this is by far the most elegant solution. Less code is better code.
现在的大多数语言——例如perl (Jon Ericson提到)、php、javascript——都有一个join()函数或方法,这是迄今为止最优雅的解决方案。代码越少越好。
In response to Mendelt Siebenga, if you do require a hand-rolled solution, I'd go with the ternary operator for something like:
作为对Mendelt Siebenga的回应,如果您确实需要一个手摇的解决方案,我将使用三元运算符进行如下操作:
separator = ","
foreach (item in stringCollection)
{
concatenatedString += concatenatedString ? separator + item : item
}
#4
3
I usually go with something like...
我通常喜欢……
list = ["Alpha", "Beta", "Gamma"];
output = "";
separator = "";
for (int i = 0; i < list.length ; i++) {
output = output + separator;
output = output + list[i];
separator = ", ";
}
This works because on the first pass, separator is empty (so you don't get a comma at the start, but on every subsequent pass, you add a comma before adding the next element.
这是可行的,因为在第一次遍历中,分隔符是空的(因此在开始时没有逗号,但是在以后的每一次遍历中,在添加下一个元素之前都要添加一个逗号。
You could certainly unroll this a little to make it a bit faster (assigning to the separator over and over isn't ideal), though I suspect that's something the compiler could do for you automatically.
当然,您可以稍微展开它以使它更快一些(反复分配给分隔符不是理想的),尽管我怀疑编译器可以自动为您做一些事情。
In the end though, I suspect pretty this is what most language level join functions come down to. Nothing more than syntax sugar, but it sure is sweet.
最后,我怀疑这就是大多数语言级别的连接函数。只不过是语法糖,但它确实很甜。
#5
3
For pure elegance, a typical recursive functional-language solution is quite nice. This isn't in an actual language syntax but you get the idea (it's also hardcoded to use comma separator):
对于纯粹的优雅性,典型的递归函数语言解决方案是相当不错的。这不是一种实际的语言语法,但是您可以理解(使用逗号分隔符也是硬编码的):
join([]) = ""
加入([])= " "
join([x]) = "x"
加入([x])= " x "
join([x, rest]) = "x," + join(rest)
连接([x, rest]) = "x " + join(rest)
In reality you would write this in a more generic way, to reuse the same algorithm but abstract away the data type (doesn't have to be strings) and the operation (doesn't have to be concatenation with a comma in the middle). Then it usually gets called 'reduce', and many functional languages have this built in, e.g. multiplying all numbers in a list, in Lisp:
实际上,您可以用一种更通用的方式编写它,以重用相同的算法,但抽象掉数据类型(不一定是字符串)和操作(不必在中间用逗号连接)。然后它通常被称为“reduce”,许多函数语言都内置了这个功能,例如,用Lisp将列表中的所有数字相乘:
(reduce #'* '(1 2 3 4 5)) => 120
(减少#'* '(1 2 3 4 5))=> 120
#6
2
@Mendelt Siebenga
@Mendelt Siebenga
Strings are corner-stone objects in programming languages. Different languages implement strings differently. An implementation of join()
strongly depends on underlying implementation of strings. Pseudocode doesn't reflect underlying implementation.
字符串是编程语言中的基石对象。不同的语言实现字符串的方式不同。join()的实现强烈地依赖于字符串的底层实现。伪代码不反映底层实现。
Consider join()
in Python. It can be easily used:
考虑加入Python中的()。它可以很容易使用:
print ", ".join(["Alpha", "Beta", "Gamma"])
# Alpha, Beta, Gamma
It could be easily implemented as follow:
它可以很容易地执行如下:
def join(seq, sep=" "):
if not seq: return ""
elif len(seq) == 1: return seq[0]
return reduce(lambda x, y: x + sep + y, seq)
print join(["Alpha", "Beta", "Gamma"], ", ")
# Alpha, Beta, Gamma
And here how join()
method is implemented in C (taken from trunk):
这里,join()方法如何在C中实现(取自trunk):
PyDoc_STRVAR(join__doc__,
"S.join(sequence) -> string\n\
\n\
Return a string which is the concatenation of the strings in the\n\
sequence. The separator between elements is S.");
static PyObject *
string_join(PyStringObject *self, PyObject *orig)
{
char *sep = PyString_AS_STRING(self);
const Py_ssize_t seplen = PyString_GET_SIZE(self);
PyObject *res = NULL;
char *p;
Py_ssize_t seqlen = 0;
size_t sz = 0;
Py_ssize_t i;
PyObject *seq, *item;
seq = PySequence_Fast(orig, "");
if (seq == NULL) {
return NULL;
}
seqlen = PySequence_Size(seq);
if (seqlen == 0) {
Py_DECREF(seq);
return PyString_FromString("");
}
if (seqlen == 1) {
item = PySequence_Fast_GET_ITEM(seq, 0);
if (PyString_CheckExact(item) || PyUnicode_CheckExact(item)) {
Py_INCREF(item);
Py_DECREF(seq);
return item;
}
}
/* There are at least two things to join, or else we have a subclass
* of the builtin types in the sequence.
* Do a pre-pass to figure out the total amount of space we'll
* need (sz), see whether any argument is absurd, and defer to
* the Unicode join if appropriate.
*/
for (i = 0; i < seqlen; i++) {
const size_t old_sz = sz;
item = PySequence_Fast_GET_ITEM(seq, i);
if (!PyString_Check(item)){
#ifdef Py_USING_UNICODE
if (PyUnicode_Check(item)) {
/* Defer to Unicode join.
* CAUTION: There's no gurantee that the
* original sequence can be iterated over
* again, so we must pass seq here.
*/
PyObject *result;
result = PyUnicode_Join((PyObject *)self, seq);
Py_DECREF(seq);
return result;
}
#endif
PyErr_Format(PyExc_TypeError,
"sequence item %zd: expected string,"
" %.80s found",
i, Py_TYPE(item)->tp_name);
Py_DECREF(seq);
return NULL;
}
sz += PyString_GET_SIZE(item);
if (i != 0)
sz += seplen;
if (sz < old_sz || sz > PY_SSIZE_T_MAX) {
PyErr_SetString(PyExc_OverflowError,
"join() result is too long for a Python string");
Py_DECREF(seq);
return NULL;
}
}
/* Allocate result space. */
res = PyString_FromStringAndSize((char*)NULL, sz);
if (res == NULL) {
Py_DECREF(seq);
return NULL;
}
/* Catenate everything. */
p = PyString_AS_STRING(res);
for (i = 0; i < seqlen; ++i) {
size_t n;
item = PySequence_Fast_GET_ITEM(seq, i);
n = PyString_GET_SIZE(item);
Py_MEMCPY(p, PyString_AS_STRING(item), n);
p += n;
if (i < seqlen - 1) {
Py_MEMCPY(p, sep, seplen);
p += seplen;
}
}
Py_DECREF(seq);
return res;
}
Note that the above Catenate everything.
code is a small part of the whole function.
请注意,上面的链接连接了所有内容。代码是整个函数的一小部分。
In pseudocode:
在伪代码:
/* Catenate everything. */
for each item in sequence
copy-assign item
if not last item
copy-assign separator
#7
1
' Pseudo code Assume zero based
伪代码假定为零
ResultString = InputArray[0] n = 1 while n (is less than) Number_Of_Strings ResultString (concatenate) ", " ResultString (concatenate) InputArray[n] n = n + 1 loop
#8
1
In Perl, I just use the join command:
在Perl中,我只使用join命令:
$ echo "Alpha
Beta
Gamma" | perl -e 'print(join(", ", map {chomp; $_} <> ))'
Alpha, Beta, Gamma
(The map stuff is mostly there to create a list.)
(地图主要是用来创建列表的。)
In languages that don't have a built in, like C, I use simple iteration (untested):
在没有内置的语言中,比如C,我使用简单的迭代(未经测试):
for (i = 0; i < N-1; i++){
strcat(s, a[i]);
strcat(s, ", ");
}
strcat(s, a[N]);
Of course, you'd need to check the size of s before you add more bytes to it.
当然,在添加更多字节之前,您需要检查s的大小。
You either have to special case the first entry or the last.
你要么需要特殊情况,要么是第一项,要么是最后一项。
#9
1
collecting different language implementations ?
Here is, for your amusement, a Smalltalk version:
收集不同的语言实现?有趣的是,这里有一个Smalltalk版本:
join:collectionOfStrings separatedBy:sep
|buffer|
buffer := WriteStream on:''.
collectionOfStrings
do:[:each | buffer nextPutAll:each ]
separatedBy:[ buffer nextPutAll:sep ].
^ buffer contents.
Of course, the above code is already in the standard library found as:
当然,上述代码已经在标准库中找到:
Collection >> asStringWith:
收藏> > asStringWith:
so, using that, you'd write:
因此,你可以这样写:
#('A' 'B' 'C') asStringWith:','
But here's my main point:
但我的主要观点是:
I would like to put more emphasis on the fact that using a StringBuilder (or what is called "WriteStream" in Smalltalk) is highly recommended. Do not concatenate strings using "+" in a loop - the result will be many many intermediate throw-away strings. If you have a good Garbage Collector, thats fine. But some are not and a lot of memory needs to be reclaimed. StringBuilder (and WriteStream, which is its grand-grand-father) use a buffer-doubling or even adaptive growing algorithm, which needs MUCH less scratch memory.
我想强调的是,强烈推荐使用StringBuilder(或Smalltalk中的“WriteStream”)。不要在循环中使用“+”来连接字符串——结果将是许多中间的一次性字符串。如果你有一个好的垃圾收集器,那也没关系。但有些没有,需要回收大量内存。StringBuilder(和它的祖辈WriteStream)使用了一种加倍缓冲甚至是自适应增长算法,这种算法需要更少的内存。
However, if its only a few small strings you are concatenating, dont care, and "+" them; the extra work using a StringBuilder might be actually counter-productive, up to an implementation- and language-dependent number of strings.
然而,如果它只有几个小的字符串,你正在连接,不关心,和“+”他们;使用StringBuilder的额外工作实际上可能会适得其反,直到实现和语言相关的字符串数量。
#10
0
The following is no longer language-agnostic (but that doesn't matter for the discussion because the implementation is easily portable to other languages). I tried to implement Luke's (theretically best) solution in an imperative programming language. Take your pick; mine's C#. Not very elegant at all. However, (without any testing whatsoever) I could imagine that its performance is quite decent because the recursion is in fact tail recursive.
下面的内容不再是与语言无关的(但这对于讨论来说并不重要,因为实现很容易移植到其他语言中)。我试图用一种命令式编程语言实现Luke的(真正最好的)解决方案。随你挑吧。我叫c#。一点也不优雅。然而,(没有任何测试)我可以想象它的性能相当不错,因为递归实际上是尾部递归。
My challenge: give a better recursive implementation (in an imperative language). You say what “better” means: less code, faster, I'm open for suggestions.
我的挑战是:提供更好的递归实现(用命令式语言)。你说什么“更好”意味着:更少的代码,更快,我愿意接受建议。
private static StringBuilder RecJoin(IEnumerator<string> xs, string sep, StringBuilder result) {
result.Append(xs.Current);
if (xs.MoveNext()) {
result.Append(sep);
return RecJoin(xs, sep, result);
} else
return result;
}
public static string Join(this IEnumerable<string> xs, string separator) {
var i = xs.GetEnumerator();
if (!i.MoveNext())
return string.Empty;
else
return RecJoin(i, separator, new StringBuilder()).ToString();
}
#11
0
join()
function in Ruby:
加入()函数在Ruby中:
def join(seq, sep)
seq.inject { |total, item| total << sep << item } or ""
end
join(["a", "b", "c"], ", ")
# => "a, b, c"
#12
0
join()
in Perl:
在Perl中加入():
use List::Util qw(reduce);
sub mjoin($@) {$sep = shift; reduce {$a.$sep.$b} @_ or ''}
say mjoin(', ', qw(Alpha Beta Gamma));
# Alpha, Beta, Gamma
Or without reduce
:
或没有减少:
sub mjoin($@)
{
my ($sep, $sum) = (shift, shift);
$sum .= $sep.$_ for (@_);
$sum or ''
}
#13
0
Perl 6
sub join( $separator, @strings ){
my $return = shift @strings;
for @strings -> ( $string ){
$return ~= $separator ~ $string;
}
return $return;
}
Yes I know it is pointless because Perl 6 already has a join function.
是的,我知道这毫无意义,因为Perl 6已经有了一个连接函数。
#14
0
I wrote a recursive version of the solution in lisp. If the length of the list is greater that 2 it splits the list in half as best as it can and then tries merging the sublists
我用lisp写了一个递归版本的解决方案。如果列表的长度大于2,它会将列表分割成最好的一半,然后尝试合并子列表
(defun concatenate-string(list)
(cond ((= (length list) 1) (car list))
((= (length list) 2) (concatenate 'string (first list) "," (second list)))
(t (let ((mid-point (floor (/ (- (length list) 1) 2))))
(concatenate 'string
(concatenate-string (subseq list 0 mid-point))
","
(concatenate-string (subseq list mid-point (length list))))))))
(concatenate-string '("a" "b"))
I tried applying the divide and conquer strategy to the problem, but I guess that does not give a better result than plain iteration. Please let me know if this could have been done better.
我尝试将分治策略应用到问题中,但我认为这并没有比普通迭代更好的结果。如果能做得更好,请告诉我。
I have also performed an analysis of the recursion obtained by the algorithm, it is available here.
我还对算法得到的递归进行了分析,这里有。
#15
0
Use the String.join method in C#
使用字符串。在c#连接方法
http://msdn.microsoft.com/en-us/library/57a79xd0.aspx
http://msdn.microsoft.com/en-us/library/57a79xd0.aspx
#16
0
In Java 5, with unit test:
在Java 5中,通过单元测试:
import junit.framework.Assert;
import org.junit.Test;
public class StringUtil
{
public static String join(String delim, String... strings)
{
StringBuilder builder = new StringBuilder();
if (strings != null)
{
for (String str : strings)
{
if (builder.length() > 0)
{
builder.append(delim);
}
builder.append(str);
}
}
return builder.toString();
}
@Test
public void joinTest()
{
Assert.assertEquals("", StringUtil.join(", ", null));
Assert.assertEquals("", StringUtil.join(", ", ""));
Assert.assertEquals("", StringUtil.join(", ", new String[0]));
Assert.assertEquals("test", StringUtil.join(", ", "test"));
Assert.assertEquals("foo, bar", StringUtil.join(", ", "foo", "bar"));
Assert.assertEquals("foo, bar, baz", StringUtil.join(", ", "foo", "bar", "baz"));
}
}
#1
18
The most elegant solution i found for problems like this is something like this (in pseudocode)
对于这样的问题,我找到的最优雅的解决方案是这样的(在伪代码中)
separator = ""
foreach(item in stringCollection)
{
concatenatedString += separator + item
separator = ","
}
You just run the loop and only after the second time around the separator is set. So the first time it won't get added. It's not as clean as I'd like it to be so I'd still add comments but it's better than an if statement or adding the first or last item outside the loop.
你只需要运行这个循环,并且只在第二次围绕分隔符的时候设置。所以第一次不会被添加。它不像我希望的那样干净,所以我仍然要添加注释,但它比if语句或在循环之外添加第一个或最后一个项要好。
#2
10
All of these solutions are decent ones, but for an underlying library, both independence of separator and decent speed are important. Here is a function that fits the requirement assuming the language has some form of string builder.
所有这些解决方案都很不错,但是对于底层库来说,独立的分隔符和良好的速度都很重要。这里有一个函数,假设语言有某种形式的字符串构建器,它就能满足需求。
public static string join(String[] strings, String sep) {
if(strings.length == 0) return "";
if(strings.length == 1) return strings[0];
StringBuilder sb = new StringBuilder();
sb.append(strings[0]);
for(int i = 1; i < strings.length; i++) {
sb.append(sep);
sb.append(strings[i]);
}
return sb.toString();
}
EDIT: I suppose I should mention why this would be speedier. The main reason would be because any time you call c = a + b; the underlying construct is usually c = (new StringBuilder()).append(a).append(b).toString();. By reusing the same string builder object, we can reduce the amount of allocations and garbage we produce.
编辑:我想我应该提一下为什么这样会更快。主要原因是任何时候你调用c = a + b;底层的构造通常是c = (new StringBuilder()).append(a).append(b). tostring ();通过重用相同的字符串构建器对象,我们可以减少我们产生的分配和垃圾的数量。
And before someone chimes in with optimization is evil, we're talking about implementing a common library function. Acceptable, scalable performance is one of the requirements them. A join that takes a long time is one that's going to be not oft used.
在有人插话进行优化之前,我们讨论的是实现一个公共库函数。可接受的、可伸缩的性能是它们的要求之一。一个需要很长时间的连接是不经常使用的。
#3
5
Most languages nowadays - e.g. perl (mention by Jon Ericson), php, javascript - have a join() function or method, and this is by far the most elegant solution. Less code is better code.
现在的大多数语言——例如perl (Jon Ericson提到)、php、javascript——都有一个join()函数或方法,这是迄今为止最优雅的解决方案。代码越少越好。
In response to Mendelt Siebenga, if you do require a hand-rolled solution, I'd go with the ternary operator for something like:
作为对Mendelt Siebenga的回应,如果您确实需要一个手摇的解决方案,我将使用三元运算符进行如下操作:
separator = ","
foreach (item in stringCollection)
{
concatenatedString += concatenatedString ? separator + item : item
}
#4
3
I usually go with something like...
我通常喜欢……
list = ["Alpha", "Beta", "Gamma"];
output = "";
separator = "";
for (int i = 0; i < list.length ; i++) {
output = output + separator;
output = output + list[i];
separator = ", ";
}
This works because on the first pass, separator is empty (so you don't get a comma at the start, but on every subsequent pass, you add a comma before adding the next element.
这是可行的,因为在第一次遍历中,分隔符是空的(因此在开始时没有逗号,但是在以后的每一次遍历中,在添加下一个元素之前都要添加一个逗号。
You could certainly unroll this a little to make it a bit faster (assigning to the separator over and over isn't ideal), though I suspect that's something the compiler could do for you automatically.
当然,您可以稍微展开它以使它更快一些(反复分配给分隔符不是理想的),尽管我怀疑编译器可以自动为您做一些事情。
In the end though, I suspect pretty this is what most language level join functions come down to. Nothing more than syntax sugar, but it sure is sweet.
最后,我怀疑这就是大多数语言级别的连接函数。只不过是语法糖,但它确实很甜。
#5
3
For pure elegance, a typical recursive functional-language solution is quite nice. This isn't in an actual language syntax but you get the idea (it's also hardcoded to use comma separator):
对于纯粹的优雅性,典型的递归函数语言解决方案是相当不错的。这不是一种实际的语言语法,但是您可以理解(使用逗号分隔符也是硬编码的):
join([]) = ""
加入([])= " "
join([x]) = "x"
加入([x])= " x "
join([x, rest]) = "x," + join(rest)
连接([x, rest]) = "x " + join(rest)
In reality you would write this in a more generic way, to reuse the same algorithm but abstract away the data type (doesn't have to be strings) and the operation (doesn't have to be concatenation with a comma in the middle). Then it usually gets called 'reduce', and many functional languages have this built in, e.g. multiplying all numbers in a list, in Lisp:
实际上,您可以用一种更通用的方式编写它,以重用相同的算法,但抽象掉数据类型(不一定是字符串)和操作(不必在中间用逗号连接)。然后它通常被称为“reduce”,许多函数语言都内置了这个功能,例如,用Lisp将列表中的所有数字相乘:
(reduce #'* '(1 2 3 4 5)) => 120
(减少#'* '(1 2 3 4 5))=> 120
#6
2
@Mendelt Siebenga
@Mendelt Siebenga
Strings are corner-stone objects in programming languages. Different languages implement strings differently. An implementation of join()
strongly depends on underlying implementation of strings. Pseudocode doesn't reflect underlying implementation.
字符串是编程语言中的基石对象。不同的语言实现字符串的方式不同。join()的实现强烈地依赖于字符串的底层实现。伪代码不反映底层实现。
Consider join()
in Python. It can be easily used:
考虑加入Python中的()。它可以很容易使用:
print ", ".join(["Alpha", "Beta", "Gamma"])
# Alpha, Beta, Gamma
It could be easily implemented as follow:
它可以很容易地执行如下:
def join(seq, sep=" "):
if not seq: return ""
elif len(seq) == 1: return seq[0]
return reduce(lambda x, y: x + sep + y, seq)
print join(["Alpha", "Beta", "Gamma"], ", ")
# Alpha, Beta, Gamma
And here how join()
method is implemented in C (taken from trunk):
这里,join()方法如何在C中实现(取自trunk):
PyDoc_STRVAR(join__doc__,
"S.join(sequence) -> string\n\
\n\
Return a string which is the concatenation of the strings in the\n\
sequence. The separator between elements is S.");
static PyObject *
string_join(PyStringObject *self, PyObject *orig)
{
char *sep = PyString_AS_STRING(self);
const Py_ssize_t seplen = PyString_GET_SIZE(self);
PyObject *res = NULL;
char *p;
Py_ssize_t seqlen = 0;
size_t sz = 0;
Py_ssize_t i;
PyObject *seq, *item;
seq = PySequence_Fast(orig, "");
if (seq == NULL) {
return NULL;
}
seqlen = PySequence_Size(seq);
if (seqlen == 0) {
Py_DECREF(seq);
return PyString_FromString("");
}
if (seqlen == 1) {
item = PySequence_Fast_GET_ITEM(seq, 0);
if (PyString_CheckExact(item) || PyUnicode_CheckExact(item)) {
Py_INCREF(item);
Py_DECREF(seq);
return item;
}
}
/* There are at least two things to join, or else we have a subclass
* of the builtin types in the sequence.
* Do a pre-pass to figure out the total amount of space we'll
* need (sz), see whether any argument is absurd, and defer to
* the Unicode join if appropriate.
*/
for (i = 0; i < seqlen; i++) {
const size_t old_sz = sz;
item = PySequence_Fast_GET_ITEM(seq, i);
if (!PyString_Check(item)){
#ifdef Py_USING_UNICODE
if (PyUnicode_Check(item)) {
/* Defer to Unicode join.
* CAUTION: There's no gurantee that the
* original sequence can be iterated over
* again, so we must pass seq here.
*/
PyObject *result;
result = PyUnicode_Join((PyObject *)self, seq);
Py_DECREF(seq);
return result;
}
#endif
PyErr_Format(PyExc_TypeError,
"sequence item %zd: expected string,"
" %.80s found",
i, Py_TYPE(item)->tp_name);
Py_DECREF(seq);
return NULL;
}
sz += PyString_GET_SIZE(item);
if (i != 0)
sz += seplen;
if (sz < old_sz || sz > PY_SSIZE_T_MAX) {
PyErr_SetString(PyExc_OverflowError,
"join() result is too long for a Python string");
Py_DECREF(seq);
return NULL;
}
}
/* Allocate result space. */
res = PyString_FromStringAndSize((char*)NULL, sz);
if (res == NULL) {
Py_DECREF(seq);
return NULL;
}
/* Catenate everything. */
p = PyString_AS_STRING(res);
for (i = 0; i < seqlen; ++i) {
size_t n;
item = PySequence_Fast_GET_ITEM(seq, i);
n = PyString_GET_SIZE(item);
Py_MEMCPY(p, PyString_AS_STRING(item), n);
p += n;
if (i < seqlen - 1) {
Py_MEMCPY(p, sep, seplen);
p += seplen;
}
}
Py_DECREF(seq);
return res;
}
Note that the above Catenate everything.
code is a small part of the whole function.
请注意,上面的链接连接了所有内容。代码是整个函数的一小部分。
In pseudocode:
在伪代码:
/* Catenate everything. */
for each item in sequence
copy-assign item
if not last item
copy-assign separator
#7
1
' Pseudo code Assume zero based
伪代码假定为零
ResultString = InputArray[0] n = 1 while n (is less than) Number_Of_Strings ResultString (concatenate) ", " ResultString (concatenate) InputArray[n] n = n + 1 loop
#8
1
In Perl, I just use the join command:
在Perl中,我只使用join命令:
$ echo "Alpha
Beta
Gamma" | perl -e 'print(join(", ", map {chomp; $_} <> ))'
Alpha, Beta, Gamma
(The map stuff is mostly there to create a list.)
(地图主要是用来创建列表的。)
In languages that don't have a built in, like C, I use simple iteration (untested):
在没有内置的语言中,比如C,我使用简单的迭代(未经测试):
for (i = 0; i < N-1; i++){
strcat(s, a[i]);
strcat(s, ", ");
}
strcat(s, a[N]);
Of course, you'd need to check the size of s before you add more bytes to it.
当然,在添加更多字节之前,您需要检查s的大小。
You either have to special case the first entry or the last.
你要么需要特殊情况,要么是第一项,要么是最后一项。
#9
1
collecting different language implementations ?
Here is, for your amusement, a Smalltalk version:
收集不同的语言实现?有趣的是,这里有一个Smalltalk版本:
join:collectionOfStrings separatedBy:sep
|buffer|
buffer := WriteStream on:''.
collectionOfStrings
do:[:each | buffer nextPutAll:each ]
separatedBy:[ buffer nextPutAll:sep ].
^ buffer contents.
Of course, the above code is already in the standard library found as:
当然,上述代码已经在标准库中找到:
Collection >> asStringWith:
收藏> > asStringWith:
so, using that, you'd write:
因此,你可以这样写:
#('A' 'B' 'C') asStringWith:','
But here's my main point:
但我的主要观点是:
I would like to put more emphasis on the fact that using a StringBuilder (or what is called "WriteStream" in Smalltalk) is highly recommended. Do not concatenate strings using "+" in a loop - the result will be many many intermediate throw-away strings. If you have a good Garbage Collector, thats fine. But some are not and a lot of memory needs to be reclaimed. StringBuilder (and WriteStream, which is its grand-grand-father) use a buffer-doubling or even adaptive growing algorithm, which needs MUCH less scratch memory.
我想强调的是,强烈推荐使用StringBuilder(或Smalltalk中的“WriteStream”)。不要在循环中使用“+”来连接字符串——结果将是许多中间的一次性字符串。如果你有一个好的垃圾收集器,那也没关系。但有些没有,需要回收大量内存。StringBuilder(和它的祖辈WriteStream)使用了一种加倍缓冲甚至是自适应增长算法,这种算法需要更少的内存。
However, if its only a few small strings you are concatenating, dont care, and "+" them; the extra work using a StringBuilder might be actually counter-productive, up to an implementation- and language-dependent number of strings.
然而,如果它只有几个小的字符串,你正在连接,不关心,和“+”他们;使用StringBuilder的额外工作实际上可能会适得其反,直到实现和语言相关的字符串数量。
#10
0
The following is no longer language-agnostic (but that doesn't matter for the discussion because the implementation is easily portable to other languages). I tried to implement Luke's (theretically best) solution in an imperative programming language. Take your pick; mine's C#. Not very elegant at all. However, (without any testing whatsoever) I could imagine that its performance is quite decent because the recursion is in fact tail recursive.
下面的内容不再是与语言无关的(但这对于讨论来说并不重要,因为实现很容易移植到其他语言中)。我试图用一种命令式编程语言实现Luke的(真正最好的)解决方案。随你挑吧。我叫c#。一点也不优雅。然而,(没有任何测试)我可以想象它的性能相当不错,因为递归实际上是尾部递归。
My challenge: give a better recursive implementation (in an imperative language). You say what “better” means: less code, faster, I'm open for suggestions.
我的挑战是:提供更好的递归实现(用命令式语言)。你说什么“更好”意味着:更少的代码,更快,我愿意接受建议。
private static StringBuilder RecJoin(IEnumerator<string> xs, string sep, StringBuilder result) {
result.Append(xs.Current);
if (xs.MoveNext()) {
result.Append(sep);
return RecJoin(xs, sep, result);
} else
return result;
}
public static string Join(this IEnumerable<string> xs, string separator) {
var i = xs.GetEnumerator();
if (!i.MoveNext())
return string.Empty;
else
return RecJoin(i, separator, new StringBuilder()).ToString();
}
#11
0
join()
function in Ruby:
加入()函数在Ruby中:
def join(seq, sep)
seq.inject { |total, item| total << sep << item } or ""
end
join(["a", "b", "c"], ", ")
# => "a, b, c"
#12
0
join()
in Perl:
在Perl中加入():
use List::Util qw(reduce);
sub mjoin($@) {$sep = shift; reduce {$a.$sep.$b} @_ or ''}
say mjoin(', ', qw(Alpha Beta Gamma));
# Alpha, Beta, Gamma
Or without reduce
:
或没有减少:
sub mjoin($@)
{
my ($sep, $sum) = (shift, shift);
$sum .= $sep.$_ for (@_);
$sum or ''
}
#13
0
Perl 6
sub join( $separator, @strings ){
my $return = shift @strings;
for @strings -> ( $string ){
$return ~= $separator ~ $string;
}
return $return;
}
Yes I know it is pointless because Perl 6 already has a join function.
是的,我知道这毫无意义,因为Perl 6已经有了一个连接函数。
#14
0
I wrote a recursive version of the solution in lisp. If the length of the list is greater that 2 it splits the list in half as best as it can and then tries merging the sublists
我用lisp写了一个递归版本的解决方案。如果列表的长度大于2,它会将列表分割成最好的一半,然后尝试合并子列表
(defun concatenate-string(list)
(cond ((= (length list) 1) (car list))
((= (length list) 2) (concatenate 'string (first list) "," (second list)))
(t (let ((mid-point (floor (/ (- (length list) 1) 2))))
(concatenate 'string
(concatenate-string (subseq list 0 mid-point))
","
(concatenate-string (subseq list mid-point (length list))))))))
(concatenate-string '("a" "b"))
I tried applying the divide and conquer strategy to the problem, but I guess that does not give a better result than plain iteration. Please let me know if this could have been done better.
我尝试将分治策略应用到问题中,但我认为这并没有比普通迭代更好的结果。如果能做得更好,请告诉我。
I have also performed an analysis of the recursion obtained by the algorithm, it is available here.
我还对算法得到的递归进行了分析,这里有。
#15
0
Use the String.join method in C#
使用字符串。在c#连接方法
http://msdn.microsoft.com/en-us/library/57a79xd0.aspx
http://msdn.microsoft.com/en-us/library/57a79xd0.aspx
#16
0
In Java 5, with unit test:
在Java 5中,通过单元测试:
import junit.framework.Assert;
import org.junit.Test;
public class StringUtil
{
public static String join(String delim, String... strings)
{
StringBuilder builder = new StringBuilder();
if (strings != null)
{
for (String str : strings)
{
if (builder.length() > 0)
{
builder.append(delim);
}
builder.append(str);
}
}
return builder.toString();
}
@Test
public void joinTest()
{
Assert.assertEquals("", StringUtil.join(", ", null));
Assert.assertEquals("", StringUtil.join(", ", ""));
Assert.assertEquals("", StringUtil.join(", ", new String[0]));
Assert.assertEquals("test", StringUtil.join(", ", "test"));
Assert.assertEquals("foo, bar", StringUtil.join(", ", "foo", "bar"));
Assert.assertEquals("foo, bar, baz", StringUtil.join(", ", "foo", "bar", "baz"));
}
}