当数组成员可能包含多个单词时,如何拆分C#字符串?

时间:2022-12-26 01:38:06

I am working on a small project to take a CSV file and then insert its data into a HTML table (I would use datagrid and dataset or datatable, but the system I will be talking to does not support ASP.NET uploads for sending newsletters).

我正在开发一个小项目来获取CSV文件,然后将其数据插入到HTML表格中(我会使用datagrid和数据集或数据表,但我将要讨论的系统不支持ASP.NET上传以发送新闻稿) 。

Anyway, I will use the file.readalllines method to return the contents of the csv file into a string array.

无论如何,我将使用file.readalllines方法将csv文件的内容返回到字符串数组中。

But for each string member of the array, I will be using the string.split function to split up the string into the char array. Problem is (and the csv file is generated by the system I talk to btw - I get data from this system and feed data into it), the csv contents are makes of cars. This means that I could have:

但是对于数组的每个字符串成员,我将使用string.split函数将字符串拆分为char数组。问题是(并且csv文件是由我与btw交谈的系统生成的 - 我从这个系统获取数据并将数据输入其中),csv内容是汽车的内容。这意味着我可以:

Nissan Almera

Nissan Almera 1.4 TDi

日产Almera 1.4 TDi

VW Golf 1.9 SE

大众高尔夫1.9 SE

And so forth...

等等...

Is there a robust way I could ensure that where I have Almera 1.4 TDi, for example, it is one member in the char array I split each string into, rather than seperate members.

有没有一种强有力的方法可以确保我拥有Almera 1.4 TDi,例如,它是char数组中的一个成员,我将每个字符串拆分成,而不是单独的成员。

6 个解决方案

#1


3  

Use the overloaded version of string.Split() that limits the number of returned values.

使用限制返回值数量的string.Split()的重载版本。

    string makeModel = csvArray[0]; // or whichever column it is in
    string[] makeAndModel = makeModel.Split( new char[] { ' ' } , 2 );
    string make = makeAndModel[0];
    string model = makeAndModel[1];

#2


0  

I'm a bit daff when it comes to cars, but could you not specify the major brand as the delimiter, as opposed to spaces?

谈到汽车时我有点傻瓜,但是你不能指定主要品牌作为分隔符,而不是空格吗?

EG: Nissan Almera Nissan _X100_Ultra_Model Ford Prefect Toyota Foo Bar Honda Prius

EG:日产Almera日产_X100_Ultra_Model福特Prefect丰田Foo酒吧本田普锐斯

Parsing on Major brands (Nissan, Ford, Toyota, Honda) would produce:

解析主要品牌(日产,福特,丰田,本田)将产生:

  • Nissan Almera
  • Nissan _X100_Ultra_Model
  • Ford Prefect
  • Toyota Foo Bar
  • 丰田Foo酒吧

  • Honda Prius

#3


0  

You will need to use a regular expression.

您需要使用正则表达式。

I'm not so sure you need a regex, but you could solve the problem with one, and then you'd have 2 problems.

我不太确定你需要一个正则表达式,但你可以用一个解决问题,然后你就会遇到两个问题。

A 5 second Google search of regex csv yields a blog entry

谷歌搜索regex csv 5秒后会产生一个博客条目

,(?=([^"]*"[^"]*")*(?![^"]*"))

While at first it looks to do the trick, this regex, while not matching comma's inside strings, matches the position of the comma. So you'd think it would be pretty trivial to turn that into something useful, or at least give you a starting point.

虽然起初它看起来要做的伎俩,这个正则表达式虽然不匹配逗号的内部字符串,但匹配逗号的位置。所以你认为将它变成有用的东西是非常简单的,或者至少给你一个起点。

Mind you it fails miserably if you have an input string like

请注意,如果你有一个像输入字符串那么它会失败

123,456,"Unbalanced quote

Where it doesn't match any comma's.

哪里不匹配任何逗号。


Step 2, Another Google Search, this time for c# split csv files

第2步,另一个谷歌搜索,这次是针对c#split csv文件

CSV FILE PARSER AND WRITER IN C# (PART 3) (but check out parts 1 & 2 for the code)

C#中的CSV文件分配器和写入器(第3部分)(但请查看代码的第1部分和第2部分)

It looks a lot more robust, and even has test cases.

它看起来更强大,甚至还有测试用例。

Because there is no standard CSV format, you'll have to be the judge if this works or not for the input files that you allow.

因为没有标准的CSV格式,所以如果这对你允许的输入文件有效,你必须做出判断。

#4


0  

As I understand the issue:

据我了解这个问题:

  • The lines in the file being parsed are NOT CSV, they are space-delimited.
  • 正在解析的文件中的行不是CSV,它们是以空格分隔的。

  • The value of the first field of each line (make/model) may contain 0 or more actual spaces.
  • 每行(make / model)的第一个字段的值可以包含0个或更多个实际空格。

  • The values of the other fields in each line contain no spaces, so a space delimiter works fine for them.
  • 每行中其他字段的值不包含空格,因此空格分隔符可以正常工作。

Let's say you have four columns, and the first column value is supposed to be "Nissan Almera 1.4 TDi". Using a normal Split() would result in 7 fields rather than 4.

假设你有四列,第一列值应该是“Nissan Almera 1.4 TDi”。使用正常的Split()将导致7个字段而不是4个字段。

(Untested code)

First, just split it:

首先,将它拆分:

int numFields = 4;
string[] myFields = myLine.Split(' ');

Then, fix the array:

然后,修复数组:

int extraSpaces = myFields.length-numFields;
if(extraSpaces>0) {
  // Piece together element 0 in the array by adding the extra elements
  for(int n = 1; n <= extraSpaces; n++) {
    myFields[0] += ' ' + myFields[n];
  }
  // Move the other values back to elements 1, 2, and 3 of the array
  for(int n = 1; n < numFields; n++) {
    myFields[n] = myFields[n + extraSpaces];
    }
  }

Finally, ignore every element of the array beyond the four you actually wanted to parse.

最后,忽略数组中除了实际想要解析的四个元素之外的每个元素。

Another approach would be regular expressions. I think something like this would work:

另一种方法是正则表达式。我觉得这样的事情会起作用:

 MatchCollection m = RegEx.Matches(myLine, "^(.*) ([^ ]+) ([^ ]+) ([^ ]+)$");
 string MakeModel = m.Groups[1].Captures[0].ToString();
 string ModelYear = m.Groups[2].Captures[0].ToString();     
 string Price     = m.Groups[3].Captures[0].ToString();     
 string NumWheels = m.Groups[4].Captures[0].ToString();

No splitting or arrays here, just RegEx captured groups.

这里没有拆分或数组,只有RegEx捕获的组。

If there were a built-in String.Reverse() method (there's not), I might suggest using VB.NET's Replace() function with the Count parameter to replace all spaces after the first three spaces (assuming four fields) in the reversed raw string, then reversing it again and splitting it. Something like:

如果有一个内置的String.Reverse()方法(没有),我可能建议使用VB.NET的Replace()函数和Count参数来替换逆转前三个空格(假设四个字段)之后的所有空格原始字符串,然后再次将其反转并拆分。就像是:

string[] myFields = Microsoft.VisualBasic.Replace(myLine.Reverse(), " ", "_", 0, 3).Reverse().Split(' ');
myFields[0] = myFields[0].Replace("_", " "); //fix the underscores

#5


0  

As somebody else pointed out, string.split() takes a parameter, so you can pass a ',' to split based on that. It would not matter if you have spaces in values. Unless you are really sure that you will have no values containing commas, though, I don't sugggest doing that. Paarsing CSV files is a bit trickier than it might seem initially (handling quotes and values containing commas) and I suggest using some exising library for that like http://www.codeproject.com/KB/database/CsvReader.aspx.

正如其他人指出的那样,string.split()接受一个参数,因此你可以根据它传递','到split。如果你在值中有空格也没关系。但是,除非你确定你没有包含逗号的值,否则我不会这样做。解析CSV文件比最初看起来有点棘手(处理引号和包含逗号的值),我建议使用一些现有的库,如http://www.codeproject.com/KB/database/CsvReader.aspx。

#6


-1  

The Split() method takes a char parameter which can be used to specify the delimiter. So you can do something like:

Split()方法接受一个char参数,该参数可用于指定分隔符。所以你可以这样做:

String.Split(Convert.ToChar(","));

Judging by your question all the car makes should be delimited by commas so this should work.

从您的问题来看,所有汽车制造应该用逗号分隔,所以这应该工作。

#1


3  

Use the overloaded version of string.Split() that limits the number of returned values.

使用限制返回值数量的string.Split()的重载版本。

    string makeModel = csvArray[0]; // or whichever column it is in
    string[] makeAndModel = makeModel.Split( new char[] { ' ' } , 2 );
    string make = makeAndModel[0];
    string model = makeAndModel[1];

#2


0  

I'm a bit daff when it comes to cars, but could you not specify the major brand as the delimiter, as opposed to spaces?

谈到汽车时我有点傻瓜,但是你不能指定主要品牌作为分隔符,而不是空格吗?

EG: Nissan Almera Nissan _X100_Ultra_Model Ford Prefect Toyota Foo Bar Honda Prius

EG:日产Almera日产_X100_Ultra_Model福特Prefect丰田Foo酒吧本田普锐斯

Parsing on Major brands (Nissan, Ford, Toyota, Honda) would produce:

解析主要品牌(日产,福特,丰田,本田)将产生:

  • Nissan Almera
  • Nissan _X100_Ultra_Model
  • Ford Prefect
  • Toyota Foo Bar
  • 丰田Foo酒吧

  • Honda Prius

#3


0  

You will need to use a regular expression.

您需要使用正则表达式。

I'm not so sure you need a regex, but you could solve the problem with one, and then you'd have 2 problems.

我不太确定你需要一个正则表达式,但你可以用一个解决问题,然后你就会遇到两个问题。

A 5 second Google search of regex csv yields a blog entry

谷歌搜索regex csv 5秒后会产生一个博客条目

,(?=([^"]*"[^"]*")*(?![^"]*"))

While at first it looks to do the trick, this regex, while not matching comma's inside strings, matches the position of the comma. So you'd think it would be pretty trivial to turn that into something useful, or at least give you a starting point.

虽然起初它看起来要做的伎俩,这个正则表达式虽然不匹配逗号的内部字符串,但匹配逗号的位置。所以你认为将它变成有用的东西是非常简单的,或者至少给你一个起点。

Mind you it fails miserably if you have an input string like

请注意,如果你有一个像输入字符串那么它会失败

123,456,"Unbalanced quote

Where it doesn't match any comma's.

哪里不匹配任何逗号。


Step 2, Another Google Search, this time for c# split csv files

第2步,另一个谷歌搜索,这次是针对c#split csv文件

CSV FILE PARSER AND WRITER IN C# (PART 3) (but check out parts 1 & 2 for the code)

C#中的CSV文件分配器和写入器(第3部分)(但请查看代码的第1部分和第2部分)

It looks a lot more robust, and even has test cases.

它看起来更强大,甚至还有测试用例。

Because there is no standard CSV format, you'll have to be the judge if this works or not for the input files that you allow.

因为没有标准的CSV格式,所以如果这对你允许的输入文件有效,你必须做出判断。

#4


0  

As I understand the issue:

据我了解这个问题:

  • The lines in the file being parsed are NOT CSV, they are space-delimited.
  • 正在解析的文件中的行不是CSV,它们是以空格分隔的。

  • The value of the first field of each line (make/model) may contain 0 or more actual spaces.
  • 每行(make / model)的第一个字段的值可以包含0个或更多个实际空格。

  • The values of the other fields in each line contain no spaces, so a space delimiter works fine for them.
  • 每行中其他字段的值不包含空格,因此空格分隔符可以正常工作。

Let's say you have four columns, and the first column value is supposed to be "Nissan Almera 1.4 TDi". Using a normal Split() would result in 7 fields rather than 4.

假设你有四列,第一列值应该是“Nissan Almera 1.4 TDi”。使用正常的Split()将导致7个字段而不是4个字段。

(Untested code)

First, just split it:

首先,将它拆分:

int numFields = 4;
string[] myFields = myLine.Split(' ');

Then, fix the array:

然后,修复数组:

int extraSpaces = myFields.length-numFields;
if(extraSpaces>0) {
  // Piece together element 0 in the array by adding the extra elements
  for(int n = 1; n <= extraSpaces; n++) {
    myFields[0] += ' ' + myFields[n];
  }
  // Move the other values back to elements 1, 2, and 3 of the array
  for(int n = 1; n < numFields; n++) {
    myFields[n] = myFields[n + extraSpaces];
    }
  }

Finally, ignore every element of the array beyond the four you actually wanted to parse.

最后,忽略数组中除了实际想要解析的四个元素之外的每个元素。

Another approach would be regular expressions. I think something like this would work:

另一种方法是正则表达式。我觉得这样的事情会起作用:

 MatchCollection m = RegEx.Matches(myLine, "^(.*) ([^ ]+) ([^ ]+) ([^ ]+)$");
 string MakeModel = m.Groups[1].Captures[0].ToString();
 string ModelYear = m.Groups[2].Captures[0].ToString();     
 string Price     = m.Groups[3].Captures[0].ToString();     
 string NumWheels = m.Groups[4].Captures[0].ToString();

No splitting or arrays here, just RegEx captured groups.

这里没有拆分或数组,只有RegEx捕获的组。

If there were a built-in String.Reverse() method (there's not), I might suggest using VB.NET's Replace() function with the Count parameter to replace all spaces after the first three spaces (assuming four fields) in the reversed raw string, then reversing it again and splitting it. Something like:

如果有一个内置的String.Reverse()方法(没有),我可能建议使用VB.NET的Replace()函数和Count参数来替换逆转前三个空格(假设四个字段)之后的所有空格原始字符串,然后再次将其反转并拆分。就像是:

string[] myFields = Microsoft.VisualBasic.Replace(myLine.Reverse(), " ", "_", 0, 3).Reverse().Split(' ');
myFields[0] = myFields[0].Replace("_", " "); //fix the underscores

#5


0  

As somebody else pointed out, string.split() takes a parameter, so you can pass a ',' to split based on that. It would not matter if you have spaces in values. Unless you are really sure that you will have no values containing commas, though, I don't sugggest doing that. Paarsing CSV files is a bit trickier than it might seem initially (handling quotes and values containing commas) and I suggest using some exising library for that like http://www.codeproject.com/KB/database/CsvReader.aspx.

正如其他人指出的那样,string.split()接受一个参数,因此你可以根据它传递','到split。如果你在值中有空格也没关系。但是,除非你确定你没有包含逗号的值,否则我不会这样做。解析CSV文件比最初看起来有点棘手(处理引号和包含逗号的值),我建议使用一些现有的库,如http://www.codeproject.com/KB/database/CsvReader.aspx。

#6


-1  

The Split() method takes a char parameter which can be used to specify the delimiter. So you can do something like:

Split()方法接受一个char参数,该参数可用于指定分隔符。所以你可以这样做:

String.Split(Convert.ToChar(","));

Judging by your question all the car makes should be delimited by commas so this should work.

从您的问题来看,所有汽车制造应该用逗号分隔,所以这应该工作。