使用LINQ将字符串拆分为N长度字符串列表

时间:2022-07-29 21:42:10

I know the concept of String.Split has been addressed before with a multitude of different approaches, but I am specifically interested in a LINQ solution to this question.

我知道String.Split的概念之前已经通过多种不同的方法得到了解决,但我对这个问题的LINQ解决方案特别感兴趣。

I've attempted to write an extension class to handle the split, but both attempts have some major issues. So for the following:

我试图编写一个扩展类来处理拆分,但两次尝试都有一些重大问题。所以对于以下内容:

string s = "ABCDEFGHIJKLMNOPQRSTUVWX";
var results = s.SplitEvery(4);

I would want a list like: { "ABCD", "EFGH", "IJKL", "MNOP", "QRST", "UVWX" }

我想要一个列表,如:{“ABCD”,“EFGH”,“IJKL”,“MNOP”,“QRST”,“UVWX”}

Here is my extension class:

这是我的扩展类:

public static class Extensions
{
    public static List<string> SplitEvery(this string s, int n)
    {
        List<string> list = new List<string>();

        var Attempt1 = s.Select((c, i) => i % n== 0 ? s.Substring(i, n) : "|").Where(x => x != "|").ToList();

        var Attempt2 = s.Where((c, i) => i % n== 0).Select((c, i) => s.Substring(i, n)).ToList();

        return list;
    }
}

Attempt 1 inserts a dummy string "|" every time the condition isn't met, then removes all instances of the dummy string to create the final list. It works, but creating the bad strings seems like an unnecessary extra step. Furthermore, this attempt fails if the string isn't evenly divisible by n.

尝试1插入一个虚拟字符串“|”每次不满足条件时,都会删除虚拟字符串的所有实例以创建最终列表。它有效,但创建坏字符串似乎是一个不必要的额外步骤。此外,如果字符串不能被n整除,则此尝试将失败。

Attempt 2 was me trying to select only substrings where the index was divisible by N, but the 'i' value in the Select statement doesn't correspond to the 'i' value in the Where statement, so I get results like: { "ABCD", "BCDE", etc... }

尝试2是我试图仅选择索引可被N整除的子串,但Select语句中的'i'值与Where语句中的'i'值不对应,因此我得到如下结果:{“ ABCD“,”BCDE“等...}

I feel like I'm close to a good solution, but could use a helpful nudge in the right direction. Any suggestions?

我觉得我接近一个好的解决方案,但可以在正确的方向上使用有用的推动。有什么建议么?

[Edit]

[编辑]

I ended up going with a combination of suggestions to handle my string-splitter. It might not be the fastest, but as a newbie to LINQ, this implementation was the most succinct and easy for me to understand.

我最终得到了一些建议来处理我的字符串拆分器。它可能不是最快的,但作为LINQ的新手,这个实现对我来说是最简洁易懂的。

public static List<string> SplitEvery(this string s, int size)
{
    return s.Select((x, i) => i)
        .Where(i => i % size == 0)
        .Select(i => String.Concat(s.Skip(i).Take(size))).ToList();
}

Thanks for all the excellent suggestions.

感谢所有优秀的建议。

8 个解决方案

#1


8  

Here is another solution:

这是另一个解决方案:

var result = s.Select((x, i) => i)
              .Where(i => i % 4 == 0)
              .Select(i => s.Substring(i, s.Length - i >= 4 ? 4 : s.Length - i));

#2


24  

string s = "ABCDEFGHIJKLMNOPQRSTUVWX";
var results = s.Select((c, i) => new { c, i })
            .GroupBy(x => x.i / 4)
            .Select(g => String.Join("",g.Select(y=>y.c)))
            .ToList();

You can also use morelinq's batch

您也可以使用morelinq的批处理

var res = s.Batch(4).Select(x => String.Join("", x)).ToList();

If you don't mind using side effects, this is possible too

如果你不介意使用副作用,这也是可能的

var res2 = s.SplitEvery(4).ToList();

public static IEnumerable<string> SplitEvery(this string s, int n)
{
    int index = 0;
    return s.GroupBy(_=> index++/n).Select(g => new string(g.ToArray()));
}

And Of course every string operation question deserves a Regex answer :)

当然每个字符串操作问题都值得一个正则表达式的答案:)

var res3 = Regex.Split(s, @"(?<=\G.{4})");

#3


9  

You can use this extension method, which implemented with simple substring getting (I believe it is faster, than enumerating over characters and joining them into strings):

你可以使用这个扩展方法,它通过简单的子字符串获取实现(我相信它比枚举字符并将它们连接成字符串更快):

public static IEnumerable<string> SplitEvery(this string s, int length)
{
    int index = 0;
    while (index + length < s.Length)
    {
        yield return s.Substring(index, length);
        index += length;                
    }

    if (index < s.Length)
        yield return s.Substring(index, s.Length - index);
}

#4


6  

public static IEnumerable<string> SplitEvery(this string s, int length)
{
    return s.Where((c, index) => index % length == 0)
           .Select((c, index) => String.Concat(
                s.Skip(index * length).Take(length)
             )
           );
}

The jury is out on whether new String(chars.ToArray()) would be faster or slower for this than String.Concat(chars).

陪审团关注的是新的String(chars.ToArray())是否会比String.Concat(chars)更快或更慢。

You may of course append a .ToList() to return a List rather than IEnumerable.

您当然可以追加.ToList()来返回List而不是IEnumerable。

#5


4  

Substring should be fine to select 4-character portions of the string. You just need to be careful with last portion:

子串应该可以选择字符串的4个字符部分。你只需要小心最后一部分:

new Func<string, int, IEnumerable<string>>(
        (string s, int n) => 
           Enumerable.Range(0, (s.Length + n-1)/n)
           .Select(i => s.Substring(i*n, Math.Min(n, s.Length - i*n)))) 
("ABCDEFGHIJKLMNOPQRSTUVWX", 4)

Note: if this answer is converted into operation on generic enumerable it will have to iterate collection multiple times (Count() and Substring converted to Skip(i*n).Take(n)).

注意:如果此答案转换为泛型可枚举操作,则必须多次迭代集合(Count()和子字符串转换为Skip(i * n).Take(n))。

#6


3  

This seems to work:

这似乎有效:

public static IEnumerable<string> SplitEvery(this string s, int n) {
    var enumerators = Enumerable.Repeat(s.GetEnumerator(), n);
    while (true) {
        var chunk = string.Concat(enumerators
            .Where(e => e.MoveNext())
            .Select(e => e.Current));
        if (chunk == "") yield break;
        yield return chunk;
    }
}

#7


1  

Here's a couple of LINQy ways of doing it:

这里有几种LINQy方法:

public static IEnumerable<string> SplitEvery( this IEnumerable<char> s , int n )
{
  StringBuilder sb = new StringBuilder(n) ;
  foreach ( char c in s )
  {
    if ( sb.Length == n )
    {
      yield return sb.ToString() ;
      sb.Length = 0 ;
    }
    sb.Append(c) ;
  }
}

Or

要么

public static IEnumerable<string> SplitEvery( this string s , int n )
{
  int limit = s.Length - ( s.Length % n ) ;
  int i = 0 ;

  while ( i < limit )
  {
    yield return s.Substring(i,n) ;
    i+=n ;
  }

  if ( i < s.Length )
  {
    yield return s.Substring(i) ;
  }

}

#8


1  

This also works, but requires 'unwrapping' an IGrouping<x,y>:

这也有效,但需要“展开”IGrouping ,y>

public static IEnumerable<String> Split(this String me,int SIZE) {
  //Works by mapping the character index to a 'modulo Staircase'
  //and then grouping by that 'stair step' value
  return me.Select((c, i) => new {
    step = i - i % SIZE,
    letter = c.ToString()
  })
  .GroupBy(kvp => kvp.step)
  .Select(grouping => grouping
    .Select(g => g.letter)
    .Aggregate((a, b) => a + b)
  );
}

EDIT: Using LINQ's lazy evaluation mechanisms (yield return) you can also achieve this using recursion

编辑:使用LINQ的惰性评估机制(yield return),您也可以使用递归实现此目的

public static IEnumerable<String> Split(this String me, int SIZE) {      
  if (me.Length > SIZE) {
    var head = me.Substring(0,SIZE);
    var tail = me.Substring(SIZE,me.Length-SIZE);
    yield return head;        
    foreach (var item in tail.Split(SIZE)) {
      yield return item; 
    }
  } else { 
    yield return me;
  }
}

Although, personally, I stay away from Substring because it encourages state-ful code (counters, indexes, etc. in the parent or global scopes).

虽然,就个人而言,我远离子串,因为它鼓励状态良好的代码(父或全局范围内的计数器,索引等)。

#1


8  

Here is another solution:

这是另一个解决方案:

var result = s.Select((x, i) => i)
              .Where(i => i % 4 == 0)
              .Select(i => s.Substring(i, s.Length - i >= 4 ? 4 : s.Length - i));

#2


24  

string s = "ABCDEFGHIJKLMNOPQRSTUVWX";
var results = s.Select((c, i) => new { c, i })
            .GroupBy(x => x.i / 4)
            .Select(g => String.Join("",g.Select(y=>y.c)))
            .ToList();

You can also use morelinq's batch

您也可以使用morelinq的批处理

var res = s.Batch(4).Select(x => String.Join("", x)).ToList();

If you don't mind using side effects, this is possible too

如果你不介意使用副作用,这也是可能的

var res2 = s.SplitEvery(4).ToList();

public static IEnumerable<string> SplitEvery(this string s, int n)
{
    int index = 0;
    return s.GroupBy(_=> index++/n).Select(g => new string(g.ToArray()));
}

And Of course every string operation question deserves a Regex answer :)

当然每个字符串操作问题都值得一个正则表达式的答案:)

var res3 = Regex.Split(s, @"(?<=\G.{4})");

#3


9  

You can use this extension method, which implemented with simple substring getting (I believe it is faster, than enumerating over characters and joining them into strings):

你可以使用这个扩展方法,它通过简单的子字符串获取实现(我相信它比枚举字符并将它们连接成字符串更快):

public static IEnumerable<string> SplitEvery(this string s, int length)
{
    int index = 0;
    while (index + length < s.Length)
    {
        yield return s.Substring(index, length);
        index += length;                
    }

    if (index < s.Length)
        yield return s.Substring(index, s.Length - index);
}

#4


6  

public static IEnumerable<string> SplitEvery(this string s, int length)
{
    return s.Where((c, index) => index % length == 0)
           .Select((c, index) => String.Concat(
                s.Skip(index * length).Take(length)
             )
           );
}

The jury is out on whether new String(chars.ToArray()) would be faster or slower for this than String.Concat(chars).

陪审团关注的是新的String(chars.ToArray())是否会比String.Concat(chars)更快或更慢。

You may of course append a .ToList() to return a List rather than IEnumerable.

您当然可以追加.ToList()来返回List而不是IEnumerable。

#5


4  

Substring should be fine to select 4-character portions of the string. You just need to be careful with last portion:

子串应该可以选择字符串的4个字符部分。你只需要小心最后一部分:

new Func<string, int, IEnumerable<string>>(
        (string s, int n) => 
           Enumerable.Range(0, (s.Length + n-1)/n)
           .Select(i => s.Substring(i*n, Math.Min(n, s.Length - i*n)))) 
("ABCDEFGHIJKLMNOPQRSTUVWX", 4)

Note: if this answer is converted into operation on generic enumerable it will have to iterate collection multiple times (Count() and Substring converted to Skip(i*n).Take(n)).

注意:如果此答案转换为泛型可枚举操作,则必须多次迭代集合(Count()和子字符串转换为Skip(i * n).Take(n))。

#6


3  

This seems to work:

这似乎有效:

public static IEnumerable<string> SplitEvery(this string s, int n) {
    var enumerators = Enumerable.Repeat(s.GetEnumerator(), n);
    while (true) {
        var chunk = string.Concat(enumerators
            .Where(e => e.MoveNext())
            .Select(e => e.Current));
        if (chunk == "") yield break;
        yield return chunk;
    }
}

#7


1  

Here's a couple of LINQy ways of doing it:

这里有几种LINQy方法:

public static IEnumerable<string> SplitEvery( this IEnumerable<char> s , int n )
{
  StringBuilder sb = new StringBuilder(n) ;
  foreach ( char c in s )
  {
    if ( sb.Length == n )
    {
      yield return sb.ToString() ;
      sb.Length = 0 ;
    }
    sb.Append(c) ;
  }
}

Or

要么

public static IEnumerable<string> SplitEvery( this string s , int n )
{
  int limit = s.Length - ( s.Length % n ) ;
  int i = 0 ;

  while ( i < limit )
  {
    yield return s.Substring(i,n) ;
    i+=n ;
  }

  if ( i < s.Length )
  {
    yield return s.Substring(i) ;
  }

}

#8


1  

This also works, but requires 'unwrapping' an IGrouping<x,y>:

这也有效,但需要“展开”IGrouping ,y>

public static IEnumerable<String> Split(this String me,int SIZE) {
  //Works by mapping the character index to a 'modulo Staircase'
  //and then grouping by that 'stair step' value
  return me.Select((c, i) => new {
    step = i - i % SIZE,
    letter = c.ToString()
  })
  .GroupBy(kvp => kvp.step)
  .Select(grouping => grouping
    .Select(g => g.letter)
    .Aggregate((a, b) => a + b)
  );
}

EDIT: Using LINQ's lazy evaluation mechanisms (yield return) you can also achieve this using recursion

编辑:使用LINQ的惰性评估机制(yield return),您也可以使用递归实现此目的

public static IEnumerable<String> Split(this String me, int SIZE) {      
  if (me.Length > SIZE) {
    var head = me.Substring(0,SIZE);
    var tail = me.Substring(SIZE,me.Length-SIZE);
    yield return head;        
    foreach (var item in tail.Split(SIZE)) {
      yield return item; 
    }
  } else { 
    yield return me;
  }
}

Although, personally, I stay away from Substring because it encourages state-ful code (counters, indexes, etc. in the parent or global scopes).

虽然,就个人而言,我远离子串,因为它鼓励状态良好的代码(父或全局范围内的计数器,索引等)。