用引号和未引用的字符串分割逗号分隔的字符串。

I have the following comma-separated string that I need to split. The problem is that some of the content is within quotes and contains commas that shouldnt be used in the split...

我有下面的逗号分隔的字符串，我需要拆分它。问题是有些内容是在引号内的，并且包含了不该在分割中使用的逗号……

String:

字符串:

111,222,"33,44,55",666,"77,88","99"

I want the output:

我希望的输出:

I have tried this:

我试过这个:

(?:,?)((?<=")[^"]+(?=")|[^",]+)

But it reads the comma between "77,88","99" as a hit and I get the following output:

但它读为“77,88”和“99”之间的逗号作为点击，我得到如下输出:

Can anybody help me? Im running out of hours... :) /Peter

有人能帮助我吗?我快没时间了……彼得:)/

15 个解决方案

#1

Depending on your needs you may not be able to use a csv parser, and may in fact want to re-invent the wheel!!

根据您的需要，您可能无法使用csv解析器，并且实际上可能想要重新发明*!

You can do so with some simple regex

您可以使用一些简单的regex来实现这一点

(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)

This will do the following:

这将进行下列工作:

(?:^|,) = Match expression "Beginning of line or string ,"

(?:^ |)=匹配表达式“行或字符串的开始,”

(\"(?:[^\"]+|\"\")*\"|[^,]*) = A numbered capture group, this will select between 2 alternatives:

(\”(?:[^ \]+ | \“\”)* \”| ^,*)=一个捕获组编号,这将选择2的替代品:

stuff in quotes
在报价
stuff between commas
东西之间的逗号

This should give you the output you are looking for.

这将给您所需要的输出。

Example code in C#

在c#示例代码

public static string[] SplitCSV(string input)
{
  Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);
  List<string> list = new List<string>();
  string curr = null;
  foreach (Match match in csvSplit.Matches(input))
  {        
    curr = match.Value;
    if (0 == curr.Length)
    {
      list.Add("");
    }

    list.Add(curr.TrimStart(','));
  }

  return list.ToArray();
}

private void button1_Click(object sender, RoutedEventArgs e)
{
    Console.WriteLine(SplitCSV("111,222,\"33,44,55\",666,\"77,88\",\"99\""));
}

#2

i really like jimplode's answer, but I think a version with yield return is a little bit more useful, so here it is:

我真的很喜欢杰内爆的答案，但我认为有收益率回报的版本更有用一些，所以它是:

public IEnumerable<string> SplitCSV(string input)
{
    Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);

    foreach (Match match in csvSplit.Matches(input))
    {
        yield return match.Value.TrimStart(',');
    }
}

Maybe it's even more useful to have it like an extension method:

也许把它当作扩展方法更有用:

public static class StringHelper
{
    public static IEnumerable<string> SplitCSV(this string input)
    {
        Regex csvSplit = new Regex("(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)", RegexOptions.Compiled);

        foreach (Match match in csvSplit.Matches(input))
        {
            yield return match.Value.TrimStart(',');
        }
    }
}

#3

This regular expression works without the need to loop through values and TrimStart(','), like in the accepted answer:

这个正则表达式不需要循环遍历值和TrimStart('，')，就像公认的答案:

((?<=\")[^\"]*(?=\"(,|$)+)|(?<=,|^)[^,\"]*(?=,|$))

Here is the implementation in C#:

下面是c#中的实现:

string values = "111,222,\"33,44,55\",666,\"77,88\",\"99\"";

MatchCollection matches = new Regex("((?<=\")[^\"]*(?=\"(,|$)+)|(?<=,|^)[^,\"]*(?=,|$))").Matches(values);

foreach (var match in matches)
{
    Console.WriteLine(match);
}

Outputs

输出

#4

Try this:

试试这个:

       string s = @"111,222,""33,44,55"",666,""77,88"",""99""";

       List<string> result = new List<string>();

       var splitted = s.Split('"').ToList<string>();
       splitted.RemoveAll(x => x == ",");
       foreach (var it in splitted)
       {
           if (it.StartsWith(",") || it.EndsWith(","))
           {
               var tmp = it.TrimEnd(',').TrimStart(',');
               result.AddRange(tmp.Split(','));
           }
           else
           {
               if(!string.IsNullOrEmpty(it)) result.Add(it);
           }
      }
       //Results:

       foreach (var it in result)
       {
           Console.WriteLine(it);
       }

#5

Don't reinvent a CSV parser, try FileHelpers.

不要重新创建CSV解析器，请尝试filehelper。

#6

For Jay's answer, if you use a 2nd boolean then you can have nested double-quotes inside single-quotes and vice-versa.

对于Jay的回答，如果您使用第二个布尔值，那么您可以在单引号内嵌套双引号，反之亦然。

    private string[] splitString(string stringToSplit)
{
    char[] characters = stringToSplit.ToCharArray();
    List<string> returnValueList = new List<string>();
    string tempString = "";
    bool blockUntilEndQuote = false;
    bool blockUntilEndQuote2 = false;
    int characterCount = 0;
    foreach (char character in characters)
    {
        characterCount = characterCount + 1;

        if (character == '"' && !blockUntilEndQuote2)
        {
            if (blockUntilEndQuote == false)
            {
                blockUntilEndQuote = true;
            }
            else if (blockUntilEndQuote == true)
            {
                blockUntilEndQuote = false;
            }
        }
        if (character == '\'' && !blockUntilEndQuote)
        {
            if (blockUntilEndQuote2 == false)
            {
                blockUntilEndQuote2 = true;
            }
            else if (blockUntilEndQuote2 == true)
            {
                blockUntilEndQuote2 = false;
            }
        }

        if (character != ',')
        {
            tempString = tempString + character;
        }
        else if (character == ',' && (blockUntilEndQuote == true || blockUntilEndQuote2 == true))
        {
            tempString = tempString + character;
        }
        else
        {
            returnValueList.Add(tempString);
            tempString = "";
        }

        if (characterCount == characters.Length)
        {
            returnValueList.Add(tempString);
            tempString = "";
        }
    }

    string[] returnValue = returnValueList.ToArray();
    return returnValue;
}

#7

None of these answers work when the string has a comma inside quotes, as in "value, 1", or escaped double-quotes, as in "value ""1""", which are valid CSV that should be parsed as value, 1 and value "1", respectively.

当字符串的引号(如“value”、“1”或“转义”双引号(如“value”“1”)是有效的CSV，应该分别解析为value、1和value“1”)时，这些答案都不起作用。

This will also work with the tab-delimited format if you pass in a tab instead of a comma as your delimiter.

如果您将制表符(而不是逗号)作为分隔符传入制表符，那么也可以使用表分隔符格式。

public static IEnumerable<string> SplitRow(string row, char delimiter = ',')
{
    var currentString = new StringBuilder();
    var inQuotes = false;
    var quoteIsEscaped = false; //Store when a quote has been escaped.
    row = string.Format("{0}{1}", row, delimiter); //We add new cells at the delimiter, so append one for the parser.
    foreach (var character in row.Select((val, index) => new {val, index}))
    {
        if (character.val == delimiter) //We hit a delimiter character...
        {
            if (!inQuotes) //Are we inside quotes? If not, we've hit the end of a cell value.
            {
                Console.WriteLine(currentString);
                yield return currentString.ToString();
                currentString.Clear();
            }
            else
            {
                currentString.Append(character.val);
            }
        } else {
            if (character.val != ' ')
            {
                if(character.val == '"') //If we've hit a quote character...
                {
                    if(character.val == '\"' && inQuotes) //Does it appear to be a closing quote?
                    {
                        if (row[character.index + 1] == character.val) //If the character afterwards is also a quote, this is to escape that (not a closing quote).
                        {
                            quoteIsEscaped = true; //Flag that we are escaped for the next character. Don't add the escaping quote.
                        }
                        else if (quoteIsEscaped)
                        {
                            quoteIsEscaped = false; //This is an escaped quote. Add it and revert quoteIsEscaped to false.
                            currentString.Append(character.val);
                        }
                        else
                        {
                            inQuotes = false;
                        }
                    }
                    else
                    {
                        if (!inQuotes)
                        {
                            inQuotes = true;
                        }
                        else
                        {
                            currentString.Append(character.val); //...It's a quote inside a quote.
                        }
                    }
                }
                else
                {
                    currentString.Append(character.val);
                }
            }
            else
            {
                if (!string.IsNullOrWhiteSpace(currentString.ToString())) //Append only if not new cell
                {
                    currentString.Append(character.val);
                }
            }
        }
    }
}

#8

I know I'm a bit late to this, but for searches, here is how I did what you are asking about in C sharp

我知道我有点晚了，但是对于搜索，这是我如何做你在C夏普问的

private string[] splitString(string stringToSplit)
    {
        char[] characters = stringToSplit.ToCharArray();
        List<string> returnValueList = new List<string>();
        string tempString = "";
        bool blockUntilEndQuote = false;
        int characterCount = 0;
        foreach (char character in characters)
        {
            characterCount = characterCount + 1;

            if (character == '"')
            {
                if (blockUntilEndQuote == false)
                {
                    blockUntilEndQuote = true;
                }
                else if (blockUntilEndQuote == true)
                {
                    blockUntilEndQuote = false;
                }
            }

            if (character != ',')
            {
                tempString = tempString + character;
            }
            else if (character == ',' && blockUntilEndQuote == true)
            {
                tempString = tempString + character;
            }
            else
            {
                returnValueList.Add(tempString);
                tempString = "";
            }

            if (characterCount == characters.Length)
            {
                returnValueList.Add(tempString);
                tempString = "";
            }
        }

        string[] returnValue = returnValueList.ToArray();
        return returnValue;
    }

#9

With minor updates to the function provided by "Chad Hedgcock".

对“Chad Hedgcock”提供的功能进行了少量更新。

Updates are on:

更新:

Line 26: character.val == '\"' - This can never be true due to the check made on Line 24. i.e. character.val == '"'

26行:性格。val == '\" -这是不可能的，因为第24行做了检查。即字符。val = =“””

Line 28: if (row[character.index + 1] == character.val) added !quoteIsEscaped to escape 3 consecutive quotes.

28行:如果(行字符。索引+ 1]= character.val)添加! quoteisescape为转义3个连续引号。

public static IEnumerable<string> SplitRow(string row, char delimiter = ',')
{
var currentString = new StringBuilder();
var inQuotes = false;
var quoteIsEscaped = false; //Store when a quote has been escaped.
row = string.Format("{0}{1}", row, delimiter); //We add new cells at the delimiter, so append one for the parser.
foreach (var character in row.Select((val, index) => new {val, index}))
{
    if (character.val == delimiter) //We hit a delimiter character...
    {
        if (!inQuotes) //Are we inside quotes? If not, we've hit the end of a cell value.
        {
            //Console.WriteLine(currentString);
            yield return currentString.ToString();
            currentString.Clear();
        }
        else
        {
            currentString.Append(character.val);
        }
    } else {
        if (character.val != ' ')
        {
            if(character.val == '"') //If we've hit a quote character...
            {
                if(character.val == '"' && inQuotes) //Does it appear to be a closing quote?
                {
                    if (row[character.index + 1] == character.val && !quoteIsEscaped) //If the character afterwards is also a quote, this is to escape that (not a closing quote).
                    {
                        quoteIsEscaped = true; //Flag that we are escaped for the next character. Don't add the escaping quote.
                    }
                    else if (quoteIsEscaped)
                    {
                        quoteIsEscaped = false; //This is an escaped quote. Add it and revert quoteIsEscaped to false.
                        currentString.Append(character.val);
                    }
                    else
                    {
                        inQuotes = false;
                    }
                }
                else
                {
                    if (!inQuotes)
                    {
                        inQuotes = true;
                    }
                    else
                    {
                        currentString.Append(character.val); //...It's a quote inside a quote.
                    }
                }
            }
            else
            {
                currentString.Append(character.val);
            }
        }
        else
        {
            if (!string.IsNullOrWhiteSpace(currentString.ToString())) //Append only if not new cell
            {
                currentString.Append(character.val);
            }
        }
    }
}

}

#10

Currently I use the following regex:

目前我使用以下regex:

    public static Regex regexCSVSplit = new Regex(@"(?x:(
        (?<FULL>
        (^|[,;\t\r\n])\s*
        ( (?<CODAT> (?<CO>[""'])(?<DAT>([^,;\t\r\n]|(?<!\k<CO>\s*)[,;\t\r\n])*)\k<CO>) |
          (?<CODAT> (?<DAT> [^""',;\s\r\n]* )) )
        (?=\s*([,;\t\r\n]|$))
        ) |
        (?<FULL>
        (^|[\s\t\r\n])
        ( (?<CODAT> (?<CO>[""'])(?<DAT> [^""',;\s\t\r\n]* )\k<CO>) |
        (?<CODAT> (?<DAT> [^""',;\s\t\r\n]* )) )
        (?=[,;\s\t\r\n]|$))
        ))", RegexOptions.Compiled);

This solution can handle pretty chaotic cases too like below:

这个解决方案可以处理非常混乱的情况，如下所示:

This is how to feed the result into an array:

这是如何将结果输入到数组中的:

    var data = regexCSVSplit.Matches(line_to_process).Cast<Match>().Select(x => x.Groups["DAT"].Value).ToArray();

See this example in action HERE

请参见这里的示例

#11

Fast and easy:

快速和容易:

    public static string[] SplitCsv(string line)
    {
        List<string> result = new List<string>();
        StringBuilder currentStr = new StringBuilder("");
        bool inQuotes = false;
        for (int i = 0; i < line.Length; i++) // For each character
        {
            if (line[i] == '\"') // Quotes are closing or opening
                inQuotes = !inQuotes;
            else if (line[i] == ',') // Comma
            {
                if (!inQuotes) // If not in quotes, end of current string, add it to result
                {
                    result.Add(currentStr.ToString());
                    currentStr.Clear();
                }
                else
                    currentStr.Append(line[i]); // If in quotes, just add it 
            }
            else // Add any other character to current string
                currentStr.Append(line[i]); 
        }
        result.Add(currentStr.ToString());
        return result.ToArray(); // Return array of all strings
    }

With this string as input :

用这个字符串作为输入:

 111,222,"33,44,55",666,"77,88","99"

It will return :

它将返回:

#12

I once had to do something similar and in the end I got stuck with Regular Expressions. The inability for Regex to have state makes it pretty tricky - I just ended up writing a simple little parser.

我曾经做过类似的事情，但最后还是被正则表达式困住了。Regex无法具有状态，这使得它非常棘手——我最后编写了一个简单的小解析器。

If you're doing CSV parsing you should just stick to using a CSV parser - don't reinvent the wheel.

如果您正在进行CSV解析，您应该只使用CSV解析器—不要重新创建*。

#13

Here is my fastest implementation based upon string raw pointer manipulation:

以下是我基于字符串原始指针操作的最快实现:

string[] FastSplit(string sText, char? cSeparator = null, char? cQuotes = null)
    {            
        string[] oTokens;

        if (null == cSeparator)
        {
            cSeparator = DEFAULT_PARSEFIELDS_SEPARATOR;
        }

        if (null == cQuotes)
        {
            cQuotes = DEFAULT_PARSEFIELDS_QUOTE;
        }

        unsafe
        {
            fixed (char* lpText = sText)
            {
                #region Fast array estimatation

                char* lpCurrent      = lpText;                    
                int   nEstimatedSize = 0;

                while (0 != *lpCurrent)
                {
                    if (cSeparator == *lpCurrent)
                    {
                        nEstimatedSize++;
                    }

                    lpCurrent++;
                }

                nEstimatedSize++; // Add EOL char(s)
                string[] oEstimatedTokens = new string[nEstimatedSize];

                #endregion

                #region Parsing

                char[] oBuffer = new char[sText.Length];
                int    nIndex  = 0;
                int    nTokens = 0;

                lpCurrent      = lpText;

                while (0 != *lpCurrent)
                {
                    if (cQuotes == *lpCurrent)
                    {
                        // Quotes parsing

                        lpCurrent++; // Skip quote
                        nIndex = 0;  // Reset buffer

                        while (
                               (0       != *lpCurrent)
                            && (cQuotes != *lpCurrent)
                        )
                        {
                            oBuffer[nIndex] = *lpCurrent; // Store char

                            lpCurrent++; // Move source cursor
                            nIndex++;    // Move target cursor
                        }

                    } 
                    else if (cSeparator == *lpCurrent)
                    {
                        // Separator char parsing

                        oEstimatedTokens[nTokens++] = new string(oBuffer, 0, nIndex); // Store token
                        nIndex                      = 0;                              // Skip separator and Reset buffer
                    }
                    else
                    {
                        // Content parsing

                        oBuffer[nIndex] = *lpCurrent; // Store char
                        nIndex++;                     // Move target cursor
                    }

                    lpCurrent++; // Move source cursor
                }

                // Recover pending buffer

                if (nIndex > 0)
                {
                    // Store token

                    oEstimatedTokens[nTokens++] = new string(oBuffer, 0, nIndex);
                }

                // Build final tokens list

                if (nTokens == nEstimatedSize)
                {
                    oTokens = oEstimatedTokens;
                }
                else
                {
                    oTokens = new string[nTokens];
                    Array.Copy(oEstimatedTokens, 0, oTokens, 0, nTokens);
                }

                #endregion
            }
        }

        // Epilogue            

        return oTokens;
    }

#14

I needed something a little more robust, so I took from here and created this... This solution is a little less elegant and a little more verbose, but in my testing (with a 1,000,000 row sample), I found this to be 2 to 3 times faster. Plus it handles non-escaped, embedded quotes. I used string delimiter and qualifiers instead of chars because of the requirements of my solution. I found it more difficult than I expected to find a good, generic CSV parser so I hope this parsing algorithm can help someone.

我需要一些更有活力的东西，所以我从这里开始创作这个…这个解决方案不那么优雅，也有点冗长，但是在我的测试中(使用一个1,000,000行示例)，我发现它要快2到3倍。此外，它还处理非转义的嵌入式引号。由于我的解决方案的要求，我使用了字符串分隔符和限定符而不是字符。我发现找到一个好的、通用的CSV解析器比我预想的要困难得多，所以我希望这个解析算法能够帮助某些人。

    public static string[] SplitRow(string record, string delimiter, string qualifier, bool trimData)
    {
        // In-Line for example, but I implemented as string extender in production code
        Func <string, int, int> IndexOfNextNonWhiteSpaceChar = delegate (string source, int startIndex)
        {
            if (startIndex >= 0)
            {
                if (source != null)
                {
                    for (int i = startIndex; i < source.Length; i++)
                    {
                        if (!char.IsWhiteSpace(source[i]))
                        {
                            return i;
                        }
                    }
                }
            }

            return -1;
        };

        var results = new List<string>();
        var result = new StringBuilder();
        var inQualifier = false;
        var inField = false;

        // We add new columns at the delimiter, so append one for the parser.
        var row = $"{record}{delimiter}";

        for (var idx = 0; idx < row.Length; idx++)
        {
            // A delimiter character...
            if (row[idx]== delimiter[0])
            {
                // Are we inside qualifier? If not, we've hit the end of a column value.
                if (!inQualifier)
                {
                    results.Add(trimData ? result.ToString().Trim() : result.ToString());
                    result.Clear();
                    inField = false;
                }
                else
                {
                    result.Append(row[idx]);
                }
            }

            // NOT a delimiter character...
            else
            {
                // ...Not a space character
                if (row[idx] != ' ')
                {
                    // A qualifier character...
                    if (row[idx] == qualifier[0])
                    {
                        // Qualifier is closing qualifier...
                        if (inQualifier && row[IndexOfNextNonWhiteSpaceChar(row, idx + 1)] == delimiter[0])
                        {
                            inQualifier = false;
                            continue;
                        }

                        else
                        {
                            // ...Qualifier is opening qualifier
                            if (!inQualifier)
                            {
                                inQualifier = true;
                            }

                            // ...It's a qualifier inside a qualifier.
                            else
                            {
                                inField = true;
                                result.Append(row[idx]);
                            }
                        }
                    }

                    // Not a qualifier character...
                    else
                    {
                        result.Append(row[idx]);
                        inField = true;
                    }
                }

                // ...A space character
                else
                {
                    if (inQualifier || inField)
                    {
                        result.Append(row[idx]);
                    }
                }
            }
        }

        return results.ToArray<string>();
    }

Some test code:

一些测试代码:

        //var input = "111,222,\"33,44,55\",666,\"77,88\",\"99\"";

        var input =
            "111, 222, \"99\",\"33,44,55\" ,      \"666 \"mark of a man\"\", \" spaces \"77,88\"   \"";

        Console.WriteLine("Split with trim");
        Console.WriteLine("---------------");
        var result = SplitRow(input, ",", "\"", true);
        foreach (var r in result)
        {
            Console.WriteLine(r);
        }
        Console.WriteLine("");

        // Split 2
        Console.WriteLine("Split with no trim");
        Console.WriteLine("------------------");
        var result2 = SplitRow(input, ",", "\"", false);
        foreach (var r in result2)
        {
            Console.WriteLine(r);
        }
        Console.WriteLine("");

        // Time Trial 1
        Console.WriteLine("Experimental Process (1,000,000) iterations");
        Console.WriteLine("-------------------------------------------");
        watch = Stopwatch.StartNew();
        for (var i = 0; i < 1000000; i++)
        {
            var x1 = SplitRow(input, ",", "\"", false);
        }
        watch.Stop();
        elapsedMs = watch.ElapsedMilliseconds;
        Console.WriteLine($"Total Process Time: {string.Format("{0:0.###}", elapsedMs / 1000.0)} Seconds");
        Console.WriteLine("");

Results

结果

Split with trim
---------------
111
222
99
33,44,55
666 "mark of a man"
spaces "77,88"

Split with no trim
------------------
111
222
99
33,44,55
666 "mark of a man"
 spaces "77,88"

Original Process (1,000,000) iterations
-------------------------------
Total Process Time: 7.538 Seconds

Experimental Process (1,000,000) iterations
--------------------------------------------
Total Process Time: 3.363 Seconds

#15

Try this

试试这个

private string[] GetCommaSeperatedWords(string sep, string line)
    {
        List<string> list = new List<string>();
        StringBuilder word = new StringBuilder();
        int doubleQuoteCount = 0;
        for (int i = 0; i < line.Length; i++)
        {
            string chr = line[i].ToString();
            if (chr == "\"")
            {
                if (doubleQuoteCount == 0)
                    doubleQuoteCount++;
                else
                    doubleQuoteCount--;

                continue;
            }
            if (chr == sep && doubleQuoteCount == 0)
            {
                list.Add(word.ToString());
                word = new StringBuilder();
                continue;
            }
            word.Append(chr);
        }

        list.Add(word.ToString());

        return list.ToArray();
    }

#1