Suppose I have a string like this:
假设我有一个这样的字符串:
one two three "four five six" seven eight
and I want to convert it to this:
我想把它转换成这个:
one,two,three,"four five six",seven,eight
What's the easiest way to do this in C#?
在C#中最简单的方法是什么?
8 个解决方案
#1
Assuming that quotes are inescapable you can do the following.
假设引号不可避免,您可以执行以下操作。
public string SpaceToComma(string input) {
var builder = new System.Text.StringBuilder();
var inQuotes = false;
foreach ( var cur in input ) {
switch ( cur ) {
case ' ':
builder.Append(inQuotes ? cur : ',');
break;
case '"':
inQuotes = !inQuotes;
builder.Append(cur);
break;
default:
builder.Append(cur);
break;
}
}
return builder.ToString();
}
#2
static string Space2Comma(string s)
{
return string.Concat(s.Split('"').Select
((x, i) => i % 2 == 0 ? x.Replace(' ', ',') : '"' + x + '"').ToArray());
}
#3
My first guess is to use a parser that's already written and simple change the delimiter and quote character fit your needs (which are and " respectively).
我的第一个猜测是使用已经编写的解析器,并简单地更改符合您需要的分隔符和引号字符(分别是“和”)。
It looks like this is available to you in C#: http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.aspx
看起来这在C#中可用:http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.aspx
Perhaps if you changed the delimiter to " ", it may suit your needs to read in the file and then it's just a matter of calling String.Join() a for each line.
也许如果您将分隔符更改为“”,它可能适合您在文件中读取的需要,然后只需为每行调用String.Join()a。
#4
I would use the Regex class for this purpose.
我会为此目的使用Regex类。
Regular expressions can be used to match your input, break it down into individual groups, which you can then reassemble however you want. You can find documentation on the regex classes here.
正则表达式可用于匹配您的输入,将其分解为单个组,然后您可以根据需要重新组合。您可以在此处找到有关正则表达式类的文档。
Regex rx = new Regex( "(\w)|([\"]\w+[\"])" );
MatchCollection matches = rx.Matches("first second \"third fourth fifth\" sixth");
string.Join( ", ", matches.Select( x => x.Value ).ToArray() );
#5
Here's a more reusable function that I came up with:
这是我想出的一个更可重用的功能:
private string ReplaceWithExceptions(string source, char charToReplace,
char replacementChar, char exceptionChar)
{
bool ignoreReplacementChar = false;
char[] sourceArray = source.ToCharArray();
for (int i = 0; i < sourceArray.Length; i++)
{
if (sourceArray[i] == exceptionChar)
{
ignoreReplacementChar = !ignoreReplacementChar;
}
else
{
if (!ignoreReplacementChar)
{
if (sourceArray[i] == charToReplace)
{
sourceArray[i] = replacementChar;
}
}
}
}
return new string(sourceArray);
}
Usage:
string test = "one two three \"four five six\" seven eight";
System.Diagnostics.Debug.WriteLine(ReplaceWithExceptions(test, char.Parse(" "),
char.Parse(","), char.Parse("\"")));
#6
This may be overkill, but if you believe the problem may generalize, such as having a need to split by other types of characters, or having additional rules that define a token, you should consider either using a parser generator such as Coco or writing a simple one on your own. Coco/R, for instance, will build generate a lexer and parser from an EBNF grammar you provide. The lexer will be a DFA, or a state machine, which is a generalized form of the code provided by JaredPar. Your grammar definition for Coco/R would look like this:
这可能是过度的,但是如果您认为问题可能会泛化,例如需要通过其他类型的字符进行拆分,或者有其他规则来定义令牌,则应考虑使用解析器生成器(如Coco)或编写你自己很简单。例如,Coco / R将根据您提供的EBNF语法生成词法分析器和解析器。词法分析器将是DFA或状态机,它是JaredPar提供的代码的通用形式。您对Coco / R的语法定义如下所示:
CHARACTERS
alphanum = 'A'..'Z' + 'a'..'z' + '0'..'9'.
TOKENS
unit = '"' {alphanum|' '} '"' | {alphanum}.
Then the produced lexer will scan and tokanize your input accordingly.
然后生成的词法分析器将相应地扫描并设置您的输入。
#7
Per my comment to the original question, if you don't need the quotes in the final result, this will get the job done. If you do need the quotes, feel free to ignore this.
根据我对原始问题的评论,如果您不需要最终结果中的引号,这将完成工作。如果您确实需要引号,请随意忽略它。
private String SpaceToComma(string input)
{
String[] temp = input.Split(new Char[] { '"' }, StringSplitOptions.RemoveEmptyEntries);
for (Int32 i = 0; i < temp.Length; i += 2)
{
temp[i] = temp[i].Trim().Replace(' ', ',');
}
return String.Join(",", temp);
}
#8
@Mehrdad beat me to it but guess I'll post it anyway:
@Mehrdad打败了我,但我想我会发布它:
static string Convert(string input)
{
var slices = input
.Split('"')
.Select((s, i) => i % 2 != 0
? @"""" + s + @""""
: s.Trim().Replace(' ', ','));
return string.Join(",", slices.ToArray());
}
LINQified and tested :-) ... For a full console app: http://pastebin.com/f23bac59b
LINQified和测试:-) ...对于完整的控制台应用程序:http://pastebin.com/f23bac59b
#1
Assuming that quotes are inescapable you can do the following.
假设引号不可避免,您可以执行以下操作。
public string SpaceToComma(string input) {
var builder = new System.Text.StringBuilder();
var inQuotes = false;
foreach ( var cur in input ) {
switch ( cur ) {
case ' ':
builder.Append(inQuotes ? cur : ',');
break;
case '"':
inQuotes = !inQuotes;
builder.Append(cur);
break;
default:
builder.Append(cur);
break;
}
}
return builder.ToString();
}
#2
static string Space2Comma(string s)
{
return string.Concat(s.Split('"').Select
((x, i) => i % 2 == 0 ? x.Replace(' ', ',') : '"' + x + '"').ToArray());
}
#3
My first guess is to use a parser that's already written and simple change the delimiter and quote character fit your needs (which are and " respectively).
我的第一个猜测是使用已经编写的解析器,并简单地更改符合您需要的分隔符和引号字符(分别是“和”)。
It looks like this is available to you in C#: http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.aspx
看起来这在C#中可用:http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.aspx
Perhaps if you changed the delimiter to " ", it may suit your needs to read in the file and then it's just a matter of calling String.Join() a for each line.
也许如果您将分隔符更改为“”,它可能适合您在文件中读取的需要,然后只需为每行调用String.Join()a。
#4
I would use the Regex class for this purpose.
我会为此目的使用Regex类。
Regular expressions can be used to match your input, break it down into individual groups, which you can then reassemble however you want. You can find documentation on the regex classes here.
正则表达式可用于匹配您的输入,将其分解为单个组,然后您可以根据需要重新组合。您可以在此处找到有关正则表达式类的文档。
Regex rx = new Regex( "(\w)|([\"]\w+[\"])" );
MatchCollection matches = rx.Matches("first second \"third fourth fifth\" sixth");
string.Join( ", ", matches.Select( x => x.Value ).ToArray() );
#5
Here's a more reusable function that I came up with:
这是我想出的一个更可重用的功能:
private string ReplaceWithExceptions(string source, char charToReplace,
char replacementChar, char exceptionChar)
{
bool ignoreReplacementChar = false;
char[] sourceArray = source.ToCharArray();
for (int i = 0; i < sourceArray.Length; i++)
{
if (sourceArray[i] == exceptionChar)
{
ignoreReplacementChar = !ignoreReplacementChar;
}
else
{
if (!ignoreReplacementChar)
{
if (sourceArray[i] == charToReplace)
{
sourceArray[i] = replacementChar;
}
}
}
}
return new string(sourceArray);
}
Usage:
string test = "one two three \"four five six\" seven eight";
System.Diagnostics.Debug.WriteLine(ReplaceWithExceptions(test, char.Parse(" "),
char.Parse(","), char.Parse("\"")));
#6
This may be overkill, but if you believe the problem may generalize, such as having a need to split by other types of characters, or having additional rules that define a token, you should consider either using a parser generator such as Coco or writing a simple one on your own. Coco/R, for instance, will build generate a lexer and parser from an EBNF grammar you provide. The lexer will be a DFA, or a state machine, which is a generalized form of the code provided by JaredPar. Your grammar definition for Coco/R would look like this:
这可能是过度的,但是如果您认为问题可能会泛化,例如需要通过其他类型的字符进行拆分,或者有其他规则来定义令牌,则应考虑使用解析器生成器(如Coco)或编写你自己很简单。例如,Coco / R将根据您提供的EBNF语法生成词法分析器和解析器。词法分析器将是DFA或状态机,它是JaredPar提供的代码的通用形式。您对Coco / R的语法定义如下所示:
CHARACTERS
alphanum = 'A'..'Z' + 'a'..'z' + '0'..'9'.
TOKENS
unit = '"' {alphanum|' '} '"' | {alphanum}.
Then the produced lexer will scan and tokanize your input accordingly.
然后生成的词法分析器将相应地扫描并设置您的输入。
#7
Per my comment to the original question, if you don't need the quotes in the final result, this will get the job done. If you do need the quotes, feel free to ignore this.
根据我对原始问题的评论,如果您不需要最终结果中的引号,这将完成工作。如果您确实需要引号,请随意忽略它。
private String SpaceToComma(string input)
{
String[] temp = input.Split(new Char[] { '"' }, StringSplitOptions.RemoveEmptyEntries);
for (Int32 i = 0; i < temp.Length; i += 2)
{
temp[i] = temp[i].Trim().Replace(' ', ',');
}
return String.Join(",", temp);
}
#8
@Mehrdad beat me to it but guess I'll post it anyway:
@Mehrdad打败了我,但我想我会发布它:
static string Convert(string input)
{
var slices = input
.Split('"')
.Select((s, i) => i % 2 != 0
? @"""" + s + @""""
: s.Trim().Replace(' ', ','));
return string.Join(",", slices.ToArray());
}
LINQified and tested :-) ... For a full console app: http://pastebin.com/f23bac59b
LINQified和测试:-) ...对于完整的控制台应用程序:http://pastebin.com/f23bac59b