如何使用Javascript解析CSV字符串,其中包含数据中的逗号?

时间:2021-06-01 17:02:09

I have the following type of string

我有以下类型的字符串

var string = "'string, duppi, du', 23, lala"

I want to split the string into an array on each comma, but only the commas outside the single quotation marks.

我想在每个逗号上将字符串分割成一个数组,但只将逗号放在单引号之外。

I cant figure out the right regex for the split...

我找不到合适的regex来进行拆分……

string.split(/,/)

will give me

会给我

["'string", " duppi", " du'", " 23", " lala"]

but the result should be:

但结果应该是:

["string, duppi, du", "23", "lala"]

is there any cross browser solution?

有没有跨浏览器的解决方案?

13 个解决方案

#1


169  

Disclaimer

2014-12-01 Update: The answer below works only for one very specific format of CSV. As correctly pointed out by DG in the comments, this solution does NOT fit the RFC 4180 definition of CSV and it also does NOT fit MS Excel format. This solution simply demonstrates how one can parse one (non-standard) CSV line of input which contains a mix of string types, where the strings may contain escaped quotes and commas.

2014-12-01更新:以下的答案只适用于一种非常特定的CSV格式。正如DG在评论中正确指出的那样,这个解决方案不符合CSV的RFC 4180定义,而且它也不适合MS Excel格式。这个解决方案简单地演示了如何解析一个(非标准的)CSV输入行,其中包含一组字符串类型,其中的字符串可能包含转义的引号和逗号。

A non-standard CSV solution

As austincheney correctly points out, you really need to parse the string from start to finish if you wish to properly handle quoted strings that may contain escaped characters. Also, the OP does not clearly define what a "CSV string" really is. First we must define what constitutes a valid CSV string and its individual values.

正如austincheney所指出的,如果您希望正确地处理可能包含转义字符的引号,那么您确实需要从头到尾地解析字符串。此外,OP没有明确定义“CSV字符串”的真正含义。首先,我们必须定义什么构成一个有效的CSV字符串及其各个值。

Given: "CSV String" Definition

For the purpose of this discussion, a "CSV string" consists of zero or more values, where multiple values are separated by a comma. Each value may consist of:

为了讨论这个问题,一个“CSV字符串”由0个或多个值组成,其中多个值用逗号分隔。每个值可包括:

  1. A double quoted string. (may contain unescaped single quotes.)
  2. 一个双引号字符串。(可能包含未转义的单引号。)
  3. A single quoted string. (may contain unescaped double quotes.)
  4. 一个引用字符串。(可能包含未转义的双引号。)
  5. A non-quoted string. (may NOT contain quotes, commas or backslashes.)
  6. 一个引用字符串。(不得包含引号、逗号或反斜杠。)
  7. An empty value. (An all whitespace value is considered empty.)
  8. 一个空值。(所有空格值被认为是空的。)

Rules/Notes:

规则/指出:

  • Quoted values may contain commas.
  • 引号中的值可以包含逗号。
  • Quoted values may contain escaped-anything, e.g. 'that\'s cool'.
  • 引用的值可以包含任何可避免的东西,例如。”,\ ' s酷”。
  • Values containing quotes, commas, or backslashes must be quoted.
  • 必须引用包含引号、逗号或反斜杠的值。
  • Values containing leading or trailing whitespace must be quoted.
  • 必须引用包含前置或后置空格的值。
  • The backslash is removed from all: \' in single quoted values.
  • 将反斜杠从所有的:\'中删除。
  • The backslash is removed from all: \" in double quoted values.
  • 从所有的“\”中删除反斜杠,双引号中的值。
  • Non-quoted strings are trimmed of any leading and trailing spaces.
  • 非引用的字符串被修剪成任何前导和尾随空格。
  • The comma separator may have adjacent whitespace (which is ignored).
  • 逗号分隔符可能有相邻的空格(被忽略)。

Find:

A JavaScript function which converts a valid CSV string (as defined above) into an array of string values.

将有效的CSV字符串(如上所定义)转换为字符串值数组的JavaScript函数。

Solution:

The regular expressions used by this solution are complex. And (IMHO) all non-trivial regexes should be presented in free-spacing mode with lots of comments and indentation. Unfortunately, JavaScript does not allow free-spacing mode. Thus, the regular expressions implemented by this solution are first presented in native regex syntax (expressed using Python's handy: r'''...''' raw-multi-line-string syntax).

这个解决方案使用的正则表达式是复杂的。并且(IMHO)所有非平凡的正则表达式都应该以*间距的方式表示,并带有大量的注释和缩进。不幸的是,JavaScript不允许*间隔模式。因此,该解决方案实现的正则表达式首先以本机regex语法表示(使用Python的便利语言:r " '……)“‘raw-multi-line-string语法)。

First here is a regular expression which validates that a CVS string meets the above requirements:

首先,这里有一个正则表达式,用于验证CVS字符串是否满足上述要求:

Regex to validate a "CSV string":

re_valid = r"""
# Validate a CSV string having single, double or un-quoted values.
^                                   # Anchor to start of string.
\s*                                 # Allow whitespace before value.
(?:                                 # Group for value alternatives.
  '[^'\\]*(?:\\[\S\s][^'\\]*)*'     # Either Single quoted string,
| "[^"\\]*(?:\\[\S\s][^"\\]*)*"     # or Double quoted string,
| [^,'"\s\\]*(?:\s+[^,'"\s\\]+)*    # or Non-comma, non-quote stuff.
)                                   # End group of value alternatives.
\s*                                 # Allow whitespace after value.
(?:                                 # Zero or more additional values
  ,                                 # Values separated by a comma.
  \s*                               # Allow whitespace before value.
  (?:                               # Group for value alternatives.
    '[^'\\]*(?:\\[\S\s][^'\\]*)*'   # Either Single quoted string,
  | "[^"\\]*(?:\\[\S\s][^"\\]*)*"   # or Double quoted string,
  | [^,'"\s\\]*(?:\s+[^,'"\s\\]+)*  # or Non-comma, non-quote stuff.
  )                                 # End group of value alternatives.
  \s*                               # Allow whitespace after value.
)*                                  # Zero or more additional values
$                                   # Anchor to end of string.
"""

If a string matches the above regex, then that string is a valid CSV string (according to the rules previously stated) and may be parsed using the following regex. The following regex is then used to match one value from the CSV string. It is applied repeatedly until no more matches are found (and all values have been parsed).

如果一个字符串与上面的regex匹配,那么该字符串就是一个有效的CSV字符串(根据前面声明的规则),可以使用下面的regex进行解析。然后使用下面的regex来匹配CSV字符串中的一个值。它被反复应用,直到不再找到匹配(并且所有的值都被解析)。

Regex to parse one value from valid CSV string:

re_value = r"""
# Match one value in valid CSV string.
(?!\s*$)                            # Don't match empty last value.
\s*                                 # Strip whitespace before value.
(?:                                 # Group for value alternatives.
  '([^'\\]*(?:\\[\S\s][^'\\]*)*)'   # Either $1: Single quoted string,
| "([^"\\]*(?:\\[\S\s][^"\\]*)*)"   # or $2: Double quoted string,
| ([^,'"\s\\]*(?:\s+[^,'"\s\\]+)*)  # or $3: Non-comma, non-quote stuff.
)                                   # End group of value alternatives.
\s*                                 # Strip whitespace after value.
(?:,|$)                             # Field ends on comma or EOS.
"""

Note that there is one special case value that this regex does not match - the very last value when that value is empty. This special "empty last value" case is tested for and handled by the js function which follows.

注意,这个regex有一个特殊的大小写值不匹配——当该值为空时的最后一个值。这个特殊的“空最后值”用例由下面的js函数进行测试和处理。

JavaScript function to parse CSV string:

// Return array of string values, or NULL if CSV string not well formed.
function CSVtoArray(text) {
    var re_valid = /^\s*(?:'[^'\\]*(?:\\[\S\s][^'\\]*)*'|"[^"\\]*(?:\\[\S\s][^"\\]*)*"|[^,'"\s\\]*(?:\s+[^,'"\s\\]+)*)\s*(?:,\s*(?:'[^'\\]*(?:\\[\S\s][^'\\]*)*'|"[^"\\]*(?:\\[\S\s][^"\\]*)*"|[^,'"\s\\]*(?:\s+[^,'"\s\\]+)*)\s*)*$/;
    var re_value = /(?!\s*$)\s*(?:'([^'\\]*(?:\\[\S\s][^'\\]*)*)'|"([^"\\]*(?:\\[\S\s][^"\\]*)*)"|([^,'"\s\\]*(?:\s+[^,'"\s\\]+)*))\s*(?:,|$)/g;
    // Return NULL if input string is not well formed CSV string.
    if (!re_valid.test(text)) return null;
    var a = [];                     // Initialize array to receive values.
    text.replace(re_value, // "Walk" the string using replace with callback.
        function(m0, m1, m2, m3) {
            // Remove backslash from \' in single quoted values.
            if      (m1 !== undefined) a.push(m1.replace(/\\'/g, "'"));
            // Remove backslash from \" in double quoted values.
            else if (m2 !== undefined) a.push(m2.replace(/\\"/g, '"'));
            else if (m3 !== undefined) a.push(m3);
            return ''; // Return empty string.
        });
    // Handle special case of empty last value.
    if (/,\s*$/.test(text)) a.push('');
    return a;
};

Example input and output:

In the following examples, curly braces are used to delimit the {result strings}. (This is to help visualize leading/trailing spaces and zero-length strings.)

在下面的示例中,使用花括号分隔{result string}。(这有助于显示前导/后置空格和零长度字符串。)

// Test 1: Test string from original question.
var test = "'string, duppi, du', 23, lala";
var a = CSVtoArray(test);
/* Array hes 3 elements:
    a[0] = {string, duppi, du}
    a[1] = {23}
    a[2] = {lala} */
// Test 2: Empty CSV string.
var test = "";
var a = CSVtoArray(test);
/* Array hes 0 elements: */
// Test 3: CSV string with two empty values.
var test = ",";
var a = CSVtoArray(test);
/* Array hes 2 elements:
    a[0] = {}
    a[1] = {} */
// Test 4: Double quoted CSV string having single quoted values.
var test = "'one','two with escaped \' single quote', 'three, with, commas'";
var a = CSVtoArray(test);
/* Array hes 3 elements:
    a[0] = {one}
    a[1] = {two with escaped ' single quote}
    a[2] = {three, with, commas} */
// Test 5: Single quoted CSV string having double quoted values.
var test = '"one","two with escaped \" double quote", "three, with, commas"';
var a = CSVtoArray(test);
/* Array hes 3 elements:
    a[0] = {one}
    a[1] = {two with escaped " double quote}
    a[2] = {three, with, commas} */
// Test 6: CSV string with whitespace in and around empty and non-empty values.
var test = "   one  ,  'two'  ,  , ' four' ,, 'six ', ' seven ' ,  ";
var a = CSVtoArray(test);
/* Array hes 8 elements:
    a[0] = {one}
    a[1] = {two}
    a[2] = {}
    a[3] = { four}
    a[4] = {}
    a[5] = {six }
    a[6] = { seven }
    a[7] = {} */

Additional notes:

This solution requires that the CSV string be "valid". For example, unquoted values may not contain backslashes or quotes, e.g. the following CSV string is NOT valid:

这个解决方案要求CSV字符串是“有效的”。例如,未引用的值可能不包含反斜杠或引号,例如,以下CSV字符串无效:

var invalid1 = "one, that's me!, escaped \, comma"

This is not really a limitation because any sub-string may be represented as either a single or double quoted value. Note also that this solution represents only one possible definition for: "Comma Separated Values".

这并不是一个真正的限制,因为任何子字符串都可以表示为单引号或双引号。还请注意,此解决方案仅表示“逗号分隔值”的一种可能定义。

Edit: 2014-05-19: Added disclaimer. Edit: 2014-12-01: Moved disclaimer to top.

编辑:2014-05-19:添加免责声明。编辑:2014-12-01:将免责声明移至顶部。

#2


19  

RFC 4180 solution

This does not solve the string in the question since its format is not conforming with RFC 4180; the acceptable encoding is escaping double quote with double quote. The solution below works correctly with CSV files d/l from google spreadsheets.

这并不能解决问题中的字符串,因为它的格式不符合RFC 4180;可接受的编码以双引号转义双引号。下面的解决方案可以正确地使用谷歌电子表格中的CSV文件d/l。

UPDATE (3/2017)

Parsing single line would be wrong. According to RFC 4180 fields may contain CRLF which will cause any line reader to break the CSV file. Here is an updated version that parses CSV string:

解析单行是错误的。根据RFC 4180字段可能包含CRLF,这会导致任何行读取器破坏CSV文件。下面是一个更新后的版本,parses CSV字符串:

'use strict';

function csvToArray(text) {
    let p = '', row = [''], ret = [row], i = 0, r = 0, s = !0, l;
    for (l of text) {
        if ('"' === l) {
            if (s && l === p) row[i] += l;
            s = !s;
        } else if (',' === l && s) l = row[++i] = '';
        else if ('\n' === l && s) {
            if ('\r' === p) row[i] = row[i].slice(0, -1);
            row = ret[++r] = [l = '']; i = 0;
        } else row[i] += l;
        p = l;
    }
    return ret;
};

let test = '"one","two with escaped """" double quotes""","three, with, commas",four with no quotes,"five with CRLF\r\n"\r\n"2nd line one","two with escaped """" double quotes""","three, with, commas",four with no quotes,"five with CRLF\r\n"';
console.log(csvToArray(test));

OLD ANSWER

(Single line solution)

(一行解决方案)

function CSVtoArray(text) {
    let ret = [''], i = 0, p = '', s = true;
    for (let l in text) {
        l = text[l];
        if ('"' === l) {
            s = !s;
            if ('"' === p) {
                ret[i] += '"';
                l = '-';
            } else if ('' === p)
                l = '-';
        } else if (s && ',' === l)
            l = ret[++i] = '';
        else
            ret[i] += l;
        p = l;
    }
    return ret;
}
let test = '"one","two with escaped """" double quotes""","three, with, commas",four with no quotes,five for fun';
console.log(CSVtoArray(test));

And for the fun, here is how you create CSV from the array:

有趣的是,以下是如何从数组中创建CSV:

function arrayToCSV(row) {
    for (let i in row) {
        row[i] = row[i].replace(/"/g, '""');
    }
    return '"' + row.join('","') + '"';
}

let row = [
  "one",
  "two with escaped \" double quote",
  "three, with, commas",
  "four with no quotes (now has)",
  "five for fun"
];
let text = arrayToCSV(row);
console.log(text);

#3


6  

PEG(.js) grammar that handles RFC 4180 examples at http://en.wikipedia.org/wiki/Comma-separated_values:

用于处理RFC 4180示例的PEG(.js)语法,参见http://en.wikipedia.org/wiki/commaa - separation _values:

start
  = [\n\r]* first:line rest:([\n\r]+ data:line { return data; })* [\n\r]* { rest.unshift(first); return rest; }

line
  = first:field rest:("," text:field { return text; })*
    & { return !!first || rest.length; } // ignore blank lines
    { rest.unshift(first); return rest; }

field
  = '"' text:char* '"' { return text.join(''); }
  / text:[^\n\r,]* { return text.join(''); }

char
  = '"' '"' { return '"'; }
  / [^"]

Test at http://jsfiddle.net/knvzk/10 or https://pegjs.org/online.

在http://jsfiddle.net/knvzk/10或https://pegjs.org/online进行测试。

Download the generated parser at https://gist.github.com/3362830.

在https://gist.github.com/3362830下载生成的解析器。

#4


2  

If you can have your quote delimiter be double-quotes, then this is a duplicate of JavaScript Code to Parse CSV Data.

如果可以让引号分隔符为双引号,那么这是解析CSV数据的JavaScript代码的副本。

You can either translate all single-quotes to double-quotes first:

您可以将所有单引号首先转换为双引号:

string = string.replace( /'/g, '"' );

...or you can edit the regex in that question to recognize single-quotes instead of double-quotes:

…或者你可以在这个问题上编辑正则表达式来识别单引号而不是双引号:

// Quoted fields.
"(?:'([^']*(?:''[^']*)*)'|" +

However, this assumes certain markup that is not clear from your question. Please clarify what all the various possibilities of markup can be, per my comment on your question.

但是,这假定您的问题中不清楚某些标记。请在我对您的问题的评论中,阐明标记的各种可能性。

#5


2  

I had a very specific use case where I wanted to copy cells from Google Sheets into my web app. Cells could include double-quotes and new-line characters. Using copy and paste, the cells are delimited by a tab characters, and cells with odd data are double quoted. I tried this main solution, the linked article using regexp, and Jquery-CSV, and CSVToArray. http://papaparse.com/ Is the only one that worked out of the box. Copy and paste is seamless with Google Sheets with default auto-detect options.

我有一个非常特殊的用例,我想将谷歌表中的单元格复制到我的web应用程序中。单元格可以包含双引号和换行字符。使用复制和粘贴,单元格被一个制表符分隔,具有奇数数据的单元格被双引号括起来。我尝试了这个主要的解决方案,使用regexp、Jquery-CSV和CSVToArray的链接文章。http://papaparse.com/是唯一一款能在盒子里工作的软件。复制和粘贴是无缝与谷歌表与默认的自动检测选项。

#6


2  

I liked FakeRainBrigand's answer, however it contains a few problems: It can not handle whitespace between a quote and a comma, and does not support 2 consecutive commas. I tried editing his answer but my edit got rejected by reviewers that apparently did not understand my code. Here is my version of FakeRainBrigand's code. There is also a fiddle: http://jsfiddle.net/xTezm/46/

我喜欢FakeRainBrigand的回答,但是它存在一些问题:它无法处理引用和逗号之间的空格,也不支持两个连续的逗号。我试着编辑他的答案,但我的编辑被评论者拒绝了,他们显然不理解我的代码。这是我版本的FakeRainBrigand代码。还有一个小提琴:http://jsfiddle.net/xTezm/46/。

String.prototype.splitCSV = function() {
        var matches = this.match(/(\s*"[^"]+"\s*|\s*[^,]+|,)(?=,|$)/g);
        for (var n = 0; n < matches.length; ++n) {
            matches[n] = matches[n].trim();
            if (matches[n] == ',') matches[n] = '';
        }
        if (this[0] == ',') matches.unshift("");
        return matches;
}

var string = ',"string, duppi, du" , 23 ,,, "string, duppi, du",dup,"", , lala';
var parsed = string.splitCSV();
alert(parsed.join('|'));

#7


1  

My answer presumes your input is a reflection of code/content from web sources where single and double quote characters are fully interchangeable provided they occur as an non-escaped matching set.

我的回答假设您的输入是来自web源代码的代码/内容的反映,其中单引号和双引号字符可以完全互换,前提是它们作为非转义匹配集出现。

You cannot use regex for this. You actually have to write a micro parser to analyze the string you wish to split. I will, for the sake of this answer, call the quoted parts of your strings as sub-strings. You need to specifically walk across the string. Consider the following case:

不能为此使用regex。实际上,您必须编写一个微解析器来分析希望拆分的字符串。为了得到这个答案,我将把字符串中被引用的部分称为子字符串。您需要特别地跨越字符串。考虑下面的例子:

var a = "some sample string with \"double quotes\" and 'single quotes' and some craziness like this: \\\" or \\'",
    b = "sample of code from JavaScript with a regex containing a comma /\,/ that should probably be ignored.";

In this case you have absolutely no idea where a sub-string starts or ends by simply analyzing the input for a character pattern. Instead you have to write logic to make decisions on whether a quote character is used a quote character, is itself unquoted, and that the quote character is not following an escape.

在这种情况下,通过简单地分析字符模式的输入,您完全不知道子字符串在哪里开始或结束。相反,您必须编写逻辑来决定引用字符是否使用了引用字符,它本身没有引用,并且引用字符没有遵循转义。

I am not going to write that level of complexity of code for you, but you can look at something I recently wrote that has the pattern you need. This code has nothing to do with commas, but is otherwise a valid enough micro-parser for you to follow in writing your own code. Look into the asifix function of the following application:

我不会给你们写代码的复杂度,但是你们可以看看我最近写的东西有你们需要的模式。这段代码与逗号没有关系,但是在编写自己的代码时,它是一个有效的微解析器。查看以下应用程序的asifix功能:

https://github.com/austincheney/Pretty-Diff/blob/master/fulljsmin.js

https://github.com/austincheney/Pretty-Diff/blob/master/fulljsmin.js

#8


1  

People seemed to be against RegEx for this. Why?

人们似乎反对RegEx。为什么?

(\s*'[^']+'|\s*[^,]+)(?=,|$)

Here's the code. I also made a fiddle.

这里的代码。我还做了一把小提琴。

String.prototype.splitCSV = function(sep) {
  var regex = /(\s*'[^']+'|\s*[^,]+)(?=,|$)/g;
  return matches = this.match(regex);    
}

var string = "'string, duppi, du', 23, 'string, duppi, du', lala";
var parsed = string.splitCSV();
alert(parsed.join('|'));

#9


1  

While reading csv to string it contain null value in between string so try it \0 Line by line it works me.

当读取csv时,它在字符串之间包含空值,所以试着一行一行地读取\0。

stringLine = stringLine.replace( /\0/g, "" );

#10


1  

To complement this answer

这个答案补充

If you need to parse quotes escaped with another quote, example:

如果需要解析引用,请使用另一个引用,示例:

"some ""value"" that is on xlsx file",123

You can use

您可以使用

function parse(text) {
  const csvExp = /(?!\s*$)\s*(?:'([^'\\]*(?:\\[\S\s][^'\\]*)*)'|"([^"\\]*(?:\\[\S\s][^"\\]*)*)"|"([^""]*(?:"[\S\s][^""]*)*)"|([^,'"\s\\]*(?:\s+[^,'"\s\\]+)*))\s*(?:,|$)/g;

  const values = [];

  text.replace(csvExp, (m0, m1, m2, m3, m4) => {
    if (m1 !== undefined) {
      values.push(m1.replace(/\\'/g, "'"));
    }
    else if (m2 !== undefined) {
      values.push(m2.replace(/\\"/g, '"'));
    }
    else if (m3 !== undefined) {
      values.push(m3.replace(/""/g, '"'));
    }
    else if (m4 !== undefined) {
      values.push(m4);
    }
    return '';
  });

  if (/,\s*$/.test(text)) {
    values.push('');
  }

  return values;
}

#11


1  

I have also faced same type of problem when I have to parse a CSV File. The File contains a column Address which contains the ',' .
After parsing that CSV to JSON I get mismatched mapping of the keys while converting it into JSON File.
I used node for parsing the file and Library like baby parse and csvtojson
Example of file -

在解析CSV文件时,我也遇到了同样的问题。该文件包含一个列地址,其中包含“,”。解析CSV到JSON之后,我在将键转换为JSON文件时得到了不匹配的映射。我使用node解析文件和库,如baby parse和csvtojson文件-示例

address,pincode
foo,baar , 123456

While I was parsing directly without using baby parse in JSON I was getting

当我直接解析时,没有使用JSON中的baby parse

[{
 address: 'foo',
 pincode: 'baar',
 'field3': '123456'
}]

So I wrote a code which removes the comma(,) with any other deliminator with every field

因此,我编写了一个代码,该代码删除了每个字段的逗号(,)

/*
 csvString(input) = "address, pincode\\nfoo, bar, 123456\\n"
 output = "address, pincode\\nfoo {YOUR DELIMITER} bar, 123455\\n"
*/
const removeComma = function(csvString){
    let delimiter = '|'
    let Baby = require('babyparse')
    let arrRow = Baby.parse(csvString).data;
    /*
      arrRow = [ 
      [ 'address', 'pincode' ],
      [ 'foo, bar', '123456']
      ]
    */
    return arrRow.map((singleRow, index) => {
        //the data will include 
        /* 
        singleRow = [ 'address', 'pincode' ]
        */
        return singleRow.map(singleField => {
            //for removing the comma in the feild
            return singleField.split(',').join(delimiter)
        })
    }).reduce((acc, value, key) => {
        acc = acc +(Array.isArray(value) ?
         value.reduce((acc1, val)=> {
            acc1 = acc1+ val + ','
            return acc1
        }, '') : '') + '\n';
        return acc;
    },'')
}

The function returned can be passed into csvtojson library and thus result can be used.

返回的函数可以传递到csvtojson库,因此可以使用结果。

const csv = require('csvtojson')

let csvString = "address, pincode\\nfoo, bar, 123456\\n"
let jsonArray = []
modifiedCsvString = removeComma(csvString)
csv()
  .fromString(modifiedCsvString)
  .on('json', json => jsonArray.push(json))
  .on('end', () => {
    /* do any thing with the json Array */
  })
Now You can get the output like

[{
  address: 'foo, bar',
  pincode: 123456
}]

#12


0  

According to this blog post, this function should do it:

根据这篇博文,这个功能应该做到:

String.prototype.splitCSV = function(sep) {
  for (var foo = this.split(sep = sep || ","), x = foo.length - 1, tl; x >= 0; x--) {
    if (foo[x].replace(/'\s+$/, "'").charAt(foo[x].length - 1) == "'") {
      if ((tl = foo[x].replace(/^\s+'/, "'")).length > 1 && tl.charAt(0) == "'") {
        foo[x] = foo[x].replace(/^\s*'|'\s*$/g, '').replace(/''/g, "'");
      } else if (x) {
        foo.splice(x - 1, 2, [foo[x - 1], foo[x]].join(sep));
      } else foo = foo.shift().split(sep).concat(foo);
    } else foo[x].replace(/''/g, "'");
  } return foo;
};

You would call it like so:

你可以这样称呼它:

var string = "'string, duppi, du', 23, lala";
var parsed = string.splitCSV();
alert(parsed.join("|"));

This jsfiddle kind of works, but it looks like some of the elements have spaces before them.

这是jsfiddle,但是看起来有些元素前面有空格。

#13


0  

Aside from the excellent and complete answer from ridgerunner, I thought of a very simple workaround for when your backend runs php.

除了ridgerunner出色而完整的回答之外,我还想到了一个非常简单的解决方案,用于后端运行php。

Add this php file to your domain's backend (say: csv.php)

将此php文件添加到域的后端(例如:csv.php)

<?php
session_start(); //optional
header("content-type: text/xml");
header("charset=UTF-8");
//set the delimiter and the End of Line character of your csv content:
echo json_encode(array_map('str_getcsv',str_getcsv($_POST["csv"],"\n")));
?>

Now add this function to your javascript toolkit (should be revised a bit to make crossbrowser I believe.)

现在,将这个函数添加到javascript工具包中(我认为应该对它做一点修改,使其成为跨浏览器)。

function csvToArray(csv) {
    var oXhr = new XMLHttpRequest;
    oXhr.addEventListener("readystatechange",
            function () {
                if (this.readyState == 4 && this.status == 200) {
                    console.log(this.responseText);
                    console.log(JSON.parse(this.responseText));
                }
            }
    );
    oXhr.open("POST","path/to/csv.php",true);
    oXhr.setRequestHeader("Content-type","application/x-www-form-urlencoded; charset=utf-8");
    oXhr.send("csv=" + encodeURIComponent(csv));
}

Will cost you 1 ajax call, but at least you won't duplicate code nor include any external library.

将花费1个ajax调用,但至少不会重复代码,也不会包含任何外部库。

Ref: http://php.net/manual/en/function.str-getcsv.php

裁判:http://php.net/manual/en/function.str-getcsv.php

#1


169  

Disclaimer

2014-12-01 Update: The answer below works only for one very specific format of CSV. As correctly pointed out by DG in the comments, this solution does NOT fit the RFC 4180 definition of CSV and it also does NOT fit MS Excel format. This solution simply demonstrates how one can parse one (non-standard) CSV line of input which contains a mix of string types, where the strings may contain escaped quotes and commas.

2014-12-01更新:以下的答案只适用于一种非常特定的CSV格式。正如DG在评论中正确指出的那样,这个解决方案不符合CSV的RFC 4180定义,而且它也不适合MS Excel格式。这个解决方案简单地演示了如何解析一个(非标准的)CSV输入行,其中包含一组字符串类型,其中的字符串可能包含转义的引号和逗号。

A non-standard CSV solution

As austincheney correctly points out, you really need to parse the string from start to finish if you wish to properly handle quoted strings that may contain escaped characters. Also, the OP does not clearly define what a "CSV string" really is. First we must define what constitutes a valid CSV string and its individual values.

正如austincheney所指出的,如果您希望正确地处理可能包含转义字符的引号,那么您确实需要从头到尾地解析字符串。此外,OP没有明确定义“CSV字符串”的真正含义。首先,我们必须定义什么构成一个有效的CSV字符串及其各个值。

Given: "CSV String" Definition

For the purpose of this discussion, a "CSV string" consists of zero or more values, where multiple values are separated by a comma. Each value may consist of:

为了讨论这个问题,一个“CSV字符串”由0个或多个值组成,其中多个值用逗号分隔。每个值可包括:

  1. A double quoted string. (may contain unescaped single quotes.)
  2. 一个双引号字符串。(可能包含未转义的单引号。)
  3. A single quoted string. (may contain unescaped double quotes.)
  4. 一个引用字符串。(可能包含未转义的双引号。)
  5. A non-quoted string. (may NOT contain quotes, commas or backslashes.)
  6. 一个引用字符串。(不得包含引号、逗号或反斜杠。)
  7. An empty value. (An all whitespace value is considered empty.)
  8. 一个空值。(所有空格值被认为是空的。)

Rules/Notes:

规则/指出:

  • Quoted values may contain commas.
  • 引号中的值可以包含逗号。
  • Quoted values may contain escaped-anything, e.g. 'that\'s cool'.
  • 引用的值可以包含任何可避免的东西,例如。”,\ ' s酷”。
  • Values containing quotes, commas, or backslashes must be quoted.
  • 必须引用包含引号、逗号或反斜杠的值。
  • Values containing leading or trailing whitespace must be quoted.
  • 必须引用包含前置或后置空格的值。
  • The backslash is removed from all: \' in single quoted values.
  • 将反斜杠从所有的:\'中删除。
  • The backslash is removed from all: \" in double quoted values.
  • 从所有的“\”中删除反斜杠,双引号中的值。
  • Non-quoted strings are trimmed of any leading and trailing spaces.
  • 非引用的字符串被修剪成任何前导和尾随空格。
  • The comma separator may have adjacent whitespace (which is ignored).
  • 逗号分隔符可能有相邻的空格(被忽略)。

Find:

A JavaScript function which converts a valid CSV string (as defined above) into an array of string values.

将有效的CSV字符串(如上所定义)转换为字符串值数组的JavaScript函数。

Solution:

The regular expressions used by this solution are complex. And (IMHO) all non-trivial regexes should be presented in free-spacing mode with lots of comments and indentation. Unfortunately, JavaScript does not allow free-spacing mode. Thus, the regular expressions implemented by this solution are first presented in native regex syntax (expressed using Python's handy: r'''...''' raw-multi-line-string syntax).

这个解决方案使用的正则表达式是复杂的。并且(IMHO)所有非平凡的正则表达式都应该以*间距的方式表示,并带有大量的注释和缩进。不幸的是,JavaScript不允许*间隔模式。因此,该解决方案实现的正则表达式首先以本机regex语法表示(使用Python的便利语言:r " '……)“‘raw-multi-line-string语法)。

First here is a regular expression which validates that a CVS string meets the above requirements:

首先,这里有一个正则表达式,用于验证CVS字符串是否满足上述要求:

Regex to validate a "CSV string":

re_valid = r"""
# Validate a CSV string having single, double or un-quoted values.
^                                   # Anchor to start of string.
\s*                                 # Allow whitespace before value.
(?:                                 # Group for value alternatives.
  '[^'\\]*(?:\\[\S\s][^'\\]*)*'     # Either Single quoted string,
| "[^"\\]*(?:\\[\S\s][^"\\]*)*"     # or Double quoted string,
| [^,'"\s\\]*(?:\s+[^,'"\s\\]+)*    # or Non-comma, non-quote stuff.
)                                   # End group of value alternatives.
\s*                                 # Allow whitespace after value.
(?:                                 # Zero or more additional values
  ,                                 # Values separated by a comma.
  \s*                               # Allow whitespace before value.
  (?:                               # Group for value alternatives.
    '[^'\\]*(?:\\[\S\s][^'\\]*)*'   # Either Single quoted string,
  | "[^"\\]*(?:\\[\S\s][^"\\]*)*"   # or Double quoted string,
  | [^,'"\s\\]*(?:\s+[^,'"\s\\]+)*  # or Non-comma, non-quote stuff.
  )                                 # End group of value alternatives.
  \s*                               # Allow whitespace after value.
)*                                  # Zero or more additional values
$                                   # Anchor to end of string.
"""

If a string matches the above regex, then that string is a valid CSV string (according to the rules previously stated) and may be parsed using the following regex. The following regex is then used to match one value from the CSV string. It is applied repeatedly until no more matches are found (and all values have been parsed).

如果一个字符串与上面的regex匹配,那么该字符串就是一个有效的CSV字符串(根据前面声明的规则),可以使用下面的regex进行解析。然后使用下面的regex来匹配CSV字符串中的一个值。它被反复应用,直到不再找到匹配(并且所有的值都被解析)。

Regex to parse one value from valid CSV string:

re_value = r"""
# Match one value in valid CSV string.
(?!\s*$)                            # Don't match empty last value.
\s*                                 # Strip whitespace before value.
(?:                                 # Group for value alternatives.
  '([^'\\]*(?:\\[\S\s][^'\\]*)*)'   # Either $1: Single quoted string,
| "([^"\\]*(?:\\[\S\s][^"\\]*)*)"   # or $2: Double quoted string,
| ([^,'"\s\\]*(?:\s+[^,'"\s\\]+)*)  # or $3: Non-comma, non-quote stuff.
)                                   # End group of value alternatives.
\s*                                 # Strip whitespace after value.
(?:,|$)                             # Field ends on comma or EOS.
"""

Note that there is one special case value that this regex does not match - the very last value when that value is empty. This special "empty last value" case is tested for and handled by the js function which follows.

注意,这个regex有一个特殊的大小写值不匹配——当该值为空时的最后一个值。这个特殊的“空最后值”用例由下面的js函数进行测试和处理。

JavaScript function to parse CSV string:

// Return array of string values, or NULL if CSV string not well formed.
function CSVtoArray(text) {
    var re_valid = /^\s*(?:'[^'\\]*(?:\\[\S\s][^'\\]*)*'|"[^"\\]*(?:\\[\S\s][^"\\]*)*"|[^,'"\s\\]*(?:\s+[^,'"\s\\]+)*)\s*(?:,\s*(?:'[^'\\]*(?:\\[\S\s][^'\\]*)*'|"[^"\\]*(?:\\[\S\s][^"\\]*)*"|[^,'"\s\\]*(?:\s+[^,'"\s\\]+)*)\s*)*$/;
    var re_value = /(?!\s*$)\s*(?:'([^'\\]*(?:\\[\S\s][^'\\]*)*)'|"([^"\\]*(?:\\[\S\s][^"\\]*)*)"|([^,'"\s\\]*(?:\s+[^,'"\s\\]+)*))\s*(?:,|$)/g;
    // Return NULL if input string is not well formed CSV string.
    if (!re_valid.test(text)) return null;
    var a = [];                     // Initialize array to receive values.
    text.replace(re_value, // "Walk" the string using replace with callback.
        function(m0, m1, m2, m3) {
            // Remove backslash from \' in single quoted values.
            if      (m1 !== undefined) a.push(m1.replace(/\\'/g, "'"));
            // Remove backslash from \" in double quoted values.
            else if (m2 !== undefined) a.push(m2.replace(/\\"/g, '"'));
            else if (m3 !== undefined) a.push(m3);
            return ''; // Return empty string.
        });
    // Handle special case of empty last value.
    if (/,\s*$/.test(text)) a.push('');
    return a;
};

Example input and output:

In the following examples, curly braces are used to delimit the {result strings}. (This is to help visualize leading/trailing spaces and zero-length strings.)

在下面的示例中,使用花括号分隔{result string}。(这有助于显示前导/后置空格和零长度字符串。)

// Test 1: Test string from original question.
var test = "'string, duppi, du', 23, lala";
var a = CSVtoArray(test);
/* Array hes 3 elements:
    a[0] = {string, duppi, du}
    a[1] = {23}
    a[2] = {lala} */
// Test 2: Empty CSV string.
var test = "";
var a = CSVtoArray(test);
/* Array hes 0 elements: */
// Test 3: CSV string with two empty values.
var test = ",";
var a = CSVtoArray(test);
/* Array hes 2 elements:
    a[0] = {}
    a[1] = {} */
// Test 4: Double quoted CSV string having single quoted values.
var test = "'one','two with escaped \' single quote', 'three, with, commas'";
var a = CSVtoArray(test);
/* Array hes 3 elements:
    a[0] = {one}
    a[1] = {two with escaped ' single quote}
    a[2] = {three, with, commas} */
// Test 5: Single quoted CSV string having double quoted values.
var test = '"one","two with escaped \" double quote", "three, with, commas"';
var a = CSVtoArray(test);
/* Array hes 3 elements:
    a[0] = {one}
    a[1] = {two with escaped " double quote}
    a[2] = {three, with, commas} */
// Test 6: CSV string with whitespace in and around empty and non-empty values.
var test = "   one  ,  'two'  ,  , ' four' ,, 'six ', ' seven ' ,  ";
var a = CSVtoArray(test);
/* Array hes 8 elements:
    a[0] = {one}
    a[1] = {two}
    a[2] = {}
    a[3] = { four}
    a[4] = {}
    a[5] = {six }
    a[6] = { seven }
    a[7] = {} */

Additional notes:

This solution requires that the CSV string be "valid". For example, unquoted values may not contain backslashes or quotes, e.g. the following CSV string is NOT valid:

这个解决方案要求CSV字符串是“有效的”。例如,未引用的值可能不包含反斜杠或引号,例如,以下CSV字符串无效:

var invalid1 = "one, that's me!, escaped \, comma"

This is not really a limitation because any sub-string may be represented as either a single or double quoted value. Note also that this solution represents only one possible definition for: "Comma Separated Values".

这并不是一个真正的限制,因为任何子字符串都可以表示为单引号或双引号。还请注意,此解决方案仅表示“逗号分隔值”的一种可能定义。

Edit: 2014-05-19: Added disclaimer. Edit: 2014-12-01: Moved disclaimer to top.

编辑:2014-05-19:添加免责声明。编辑:2014-12-01:将免责声明移至顶部。

#2


19  

RFC 4180 solution

This does not solve the string in the question since its format is not conforming with RFC 4180; the acceptable encoding is escaping double quote with double quote. The solution below works correctly with CSV files d/l from google spreadsheets.

这并不能解决问题中的字符串,因为它的格式不符合RFC 4180;可接受的编码以双引号转义双引号。下面的解决方案可以正确地使用谷歌电子表格中的CSV文件d/l。

UPDATE (3/2017)

Parsing single line would be wrong. According to RFC 4180 fields may contain CRLF which will cause any line reader to break the CSV file. Here is an updated version that parses CSV string:

解析单行是错误的。根据RFC 4180字段可能包含CRLF,这会导致任何行读取器破坏CSV文件。下面是一个更新后的版本,parses CSV字符串:

'use strict';

function csvToArray(text) {
    let p = '', row = [''], ret = [row], i = 0, r = 0, s = !0, l;
    for (l of text) {
        if ('"' === l) {
            if (s && l === p) row[i] += l;
            s = !s;
        } else if (',' === l && s) l = row[++i] = '';
        else if ('\n' === l && s) {
            if ('\r' === p) row[i] = row[i].slice(0, -1);
            row = ret[++r] = [l = '']; i = 0;
        } else row[i] += l;
        p = l;
    }
    return ret;
};

let test = '"one","two with escaped """" double quotes""","three, with, commas",four with no quotes,"five with CRLF\r\n"\r\n"2nd line one","two with escaped """" double quotes""","three, with, commas",four with no quotes,"five with CRLF\r\n"';
console.log(csvToArray(test));

OLD ANSWER

(Single line solution)

(一行解决方案)

function CSVtoArray(text) {
    let ret = [''], i = 0, p = '', s = true;
    for (let l in text) {
        l = text[l];
        if ('"' === l) {
            s = !s;
            if ('"' === p) {
                ret[i] += '"';
                l = '-';
            } else if ('' === p)
                l = '-';
        } else if (s && ',' === l)
            l = ret[++i] = '';
        else
            ret[i] += l;
        p = l;
    }
    return ret;
}
let test = '"one","two with escaped """" double quotes""","three, with, commas",four with no quotes,five for fun';
console.log(CSVtoArray(test));

And for the fun, here is how you create CSV from the array:

有趣的是,以下是如何从数组中创建CSV:

function arrayToCSV(row) {
    for (let i in row) {
        row[i] = row[i].replace(/"/g, '""');
    }
    return '"' + row.join('","') + '"';
}

let row = [
  "one",
  "two with escaped \" double quote",
  "three, with, commas",
  "four with no quotes (now has)",
  "five for fun"
];
let text = arrayToCSV(row);
console.log(text);

#3


6  

PEG(.js) grammar that handles RFC 4180 examples at http://en.wikipedia.org/wiki/Comma-separated_values:

用于处理RFC 4180示例的PEG(.js)语法,参见http://en.wikipedia.org/wiki/commaa - separation _values:

start
  = [\n\r]* first:line rest:([\n\r]+ data:line { return data; })* [\n\r]* { rest.unshift(first); return rest; }

line
  = first:field rest:("," text:field { return text; })*
    & { return !!first || rest.length; } // ignore blank lines
    { rest.unshift(first); return rest; }

field
  = '"' text:char* '"' { return text.join(''); }
  / text:[^\n\r,]* { return text.join(''); }

char
  = '"' '"' { return '"'; }
  / [^"]

Test at http://jsfiddle.net/knvzk/10 or https://pegjs.org/online.

在http://jsfiddle.net/knvzk/10或https://pegjs.org/online进行测试。

Download the generated parser at https://gist.github.com/3362830.

在https://gist.github.com/3362830下载生成的解析器。

#4


2  

If you can have your quote delimiter be double-quotes, then this is a duplicate of JavaScript Code to Parse CSV Data.

如果可以让引号分隔符为双引号,那么这是解析CSV数据的JavaScript代码的副本。

You can either translate all single-quotes to double-quotes first:

您可以将所有单引号首先转换为双引号:

string = string.replace( /'/g, '"' );

...or you can edit the regex in that question to recognize single-quotes instead of double-quotes:

…或者你可以在这个问题上编辑正则表达式来识别单引号而不是双引号:

// Quoted fields.
"(?:'([^']*(?:''[^']*)*)'|" +

However, this assumes certain markup that is not clear from your question. Please clarify what all the various possibilities of markup can be, per my comment on your question.

但是,这假定您的问题中不清楚某些标记。请在我对您的问题的评论中,阐明标记的各种可能性。

#5


2  

I had a very specific use case where I wanted to copy cells from Google Sheets into my web app. Cells could include double-quotes and new-line characters. Using copy and paste, the cells are delimited by a tab characters, and cells with odd data are double quoted. I tried this main solution, the linked article using regexp, and Jquery-CSV, and CSVToArray. http://papaparse.com/ Is the only one that worked out of the box. Copy and paste is seamless with Google Sheets with default auto-detect options.

我有一个非常特殊的用例,我想将谷歌表中的单元格复制到我的web应用程序中。单元格可以包含双引号和换行字符。使用复制和粘贴,单元格被一个制表符分隔,具有奇数数据的单元格被双引号括起来。我尝试了这个主要的解决方案,使用regexp、Jquery-CSV和CSVToArray的链接文章。http://papaparse.com/是唯一一款能在盒子里工作的软件。复制和粘贴是无缝与谷歌表与默认的自动检测选项。

#6


2  

I liked FakeRainBrigand's answer, however it contains a few problems: It can not handle whitespace between a quote and a comma, and does not support 2 consecutive commas. I tried editing his answer but my edit got rejected by reviewers that apparently did not understand my code. Here is my version of FakeRainBrigand's code. There is also a fiddle: http://jsfiddle.net/xTezm/46/

我喜欢FakeRainBrigand的回答,但是它存在一些问题:它无法处理引用和逗号之间的空格,也不支持两个连续的逗号。我试着编辑他的答案,但我的编辑被评论者拒绝了,他们显然不理解我的代码。这是我版本的FakeRainBrigand代码。还有一个小提琴:http://jsfiddle.net/xTezm/46/。

String.prototype.splitCSV = function() {
        var matches = this.match(/(\s*"[^"]+"\s*|\s*[^,]+|,)(?=,|$)/g);
        for (var n = 0; n < matches.length; ++n) {
            matches[n] = matches[n].trim();
            if (matches[n] == ',') matches[n] = '';
        }
        if (this[0] == ',') matches.unshift("");
        return matches;
}

var string = ',"string, duppi, du" , 23 ,,, "string, duppi, du",dup,"", , lala';
var parsed = string.splitCSV();
alert(parsed.join('|'));

#7


1  

My answer presumes your input is a reflection of code/content from web sources where single and double quote characters are fully interchangeable provided they occur as an non-escaped matching set.

我的回答假设您的输入是来自web源代码的代码/内容的反映,其中单引号和双引号字符可以完全互换,前提是它们作为非转义匹配集出现。

You cannot use regex for this. You actually have to write a micro parser to analyze the string you wish to split. I will, for the sake of this answer, call the quoted parts of your strings as sub-strings. You need to specifically walk across the string. Consider the following case:

不能为此使用regex。实际上,您必须编写一个微解析器来分析希望拆分的字符串。为了得到这个答案,我将把字符串中被引用的部分称为子字符串。您需要特别地跨越字符串。考虑下面的例子:

var a = "some sample string with \"double quotes\" and 'single quotes' and some craziness like this: \\\" or \\'",
    b = "sample of code from JavaScript with a regex containing a comma /\,/ that should probably be ignored.";

In this case you have absolutely no idea where a sub-string starts or ends by simply analyzing the input for a character pattern. Instead you have to write logic to make decisions on whether a quote character is used a quote character, is itself unquoted, and that the quote character is not following an escape.

在这种情况下,通过简单地分析字符模式的输入,您完全不知道子字符串在哪里开始或结束。相反,您必须编写逻辑来决定引用字符是否使用了引用字符,它本身没有引用,并且引用字符没有遵循转义。

I am not going to write that level of complexity of code for you, but you can look at something I recently wrote that has the pattern you need. This code has nothing to do with commas, but is otherwise a valid enough micro-parser for you to follow in writing your own code. Look into the asifix function of the following application:

我不会给你们写代码的复杂度,但是你们可以看看我最近写的东西有你们需要的模式。这段代码与逗号没有关系,但是在编写自己的代码时,它是一个有效的微解析器。查看以下应用程序的asifix功能:

https://github.com/austincheney/Pretty-Diff/blob/master/fulljsmin.js

https://github.com/austincheney/Pretty-Diff/blob/master/fulljsmin.js

#8


1  

People seemed to be against RegEx for this. Why?

人们似乎反对RegEx。为什么?

(\s*'[^']+'|\s*[^,]+)(?=,|$)

Here's the code. I also made a fiddle.

这里的代码。我还做了一把小提琴。

String.prototype.splitCSV = function(sep) {
  var regex = /(\s*'[^']+'|\s*[^,]+)(?=,|$)/g;
  return matches = this.match(regex);    
}

var string = "'string, duppi, du', 23, 'string, duppi, du', lala";
var parsed = string.splitCSV();
alert(parsed.join('|'));

#9


1  

While reading csv to string it contain null value in between string so try it \0 Line by line it works me.

当读取csv时,它在字符串之间包含空值,所以试着一行一行地读取\0。

stringLine = stringLine.replace( /\0/g, "" );

#10


1  

To complement this answer

这个答案补充

If you need to parse quotes escaped with another quote, example:

如果需要解析引用,请使用另一个引用,示例:

"some ""value"" that is on xlsx file",123

You can use

您可以使用

function parse(text) {
  const csvExp = /(?!\s*$)\s*(?:'([^'\\]*(?:\\[\S\s][^'\\]*)*)'|"([^"\\]*(?:\\[\S\s][^"\\]*)*)"|"([^""]*(?:"[\S\s][^""]*)*)"|([^,'"\s\\]*(?:\s+[^,'"\s\\]+)*))\s*(?:,|$)/g;

  const values = [];

  text.replace(csvExp, (m0, m1, m2, m3, m4) => {
    if (m1 !== undefined) {
      values.push(m1.replace(/\\'/g, "'"));
    }
    else if (m2 !== undefined) {
      values.push(m2.replace(/\\"/g, '"'));
    }
    else if (m3 !== undefined) {
      values.push(m3.replace(/""/g, '"'));
    }
    else if (m4 !== undefined) {
      values.push(m4);
    }
    return '';
  });

  if (/,\s*$/.test(text)) {
    values.push('');
  }

  return values;
}

#11


1  

I have also faced same type of problem when I have to parse a CSV File. The File contains a column Address which contains the ',' .
After parsing that CSV to JSON I get mismatched mapping of the keys while converting it into JSON File.
I used node for parsing the file and Library like baby parse and csvtojson
Example of file -

在解析CSV文件时,我也遇到了同样的问题。该文件包含一个列地址,其中包含“,”。解析CSV到JSON之后,我在将键转换为JSON文件时得到了不匹配的映射。我使用node解析文件和库,如baby parse和csvtojson文件-示例

address,pincode
foo,baar , 123456

While I was parsing directly without using baby parse in JSON I was getting

当我直接解析时,没有使用JSON中的baby parse

[{
 address: 'foo',
 pincode: 'baar',
 'field3': '123456'
}]

So I wrote a code which removes the comma(,) with any other deliminator with every field

因此,我编写了一个代码,该代码删除了每个字段的逗号(,)

/*
 csvString(input) = "address, pincode\\nfoo, bar, 123456\\n"
 output = "address, pincode\\nfoo {YOUR DELIMITER} bar, 123455\\n"
*/
const removeComma = function(csvString){
    let delimiter = '|'
    let Baby = require('babyparse')
    let arrRow = Baby.parse(csvString).data;
    /*
      arrRow = [ 
      [ 'address', 'pincode' ],
      [ 'foo, bar', '123456']
      ]
    */
    return arrRow.map((singleRow, index) => {
        //the data will include 
        /* 
        singleRow = [ 'address', 'pincode' ]
        */
        return singleRow.map(singleField => {
            //for removing the comma in the feild
            return singleField.split(',').join(delimiter)
        })
    }).reduce((acc, value, key) => {
        acc = acc +(Array.isArray(value) ?
         value.reduce((acc1, val)=> {
            acc1 = acc1+ val + ','
            return acc1
        }, '') : '') + '\n';
        return acc;
    },'')
}

The function returned can be passed into csvtojson library and thus result can be used.

返回的函数可以传递到csvtojson库,因此可以使用结果。

const csv = require('csvtojson')

let csvString = "address, pincode\\nfoo, bar, 123456\\n"
let jsonArray = []
modifiedCsvString = removeComma(csvString)
csv()
  .fromString(modifiedCsvString)
  .on('json', json => jsonArray.push(json))
  .on('end', () => {
    /* do any thing with the json Array */
  })
Now You can get the output like

[{
  address: 'foo, bar',
  pincode: 123456
}]

#12


0  

According to this blog post, this function should do it:

根据这篇博文,这个功能应该做到:

String.prototype.splitCSV = function(sep) {
  for (var foo = this.split(sep = sep || ","), x = foo.length - 1, tl; x >= 0; x--) {
    if (foo[x].replace(/'\s+$/, "'").charAt(foo[x].length - 1) == "'") {
      if ((tl = foo[x].replace(/^\s+'/, "'")).length > 1 && tl.charAt(0) == "'") {
        foo[x] = foo[x].replace(/^\s*'|'\s*$/g, '').replace(/''/g, "'");
      } else if (x) {
        foo.splice(x - 1, 2, [foo[x - 1], foo[x]].join(sep));
      } else foo = foo.shift().split(sep).concat(foo);
    } else foo[x].replace(/''/g, "'");
  } return foo;
};

You would call it like so:

你可以这样称呼它:

var string = "'string, duppi, du', 23, lala";
var parsed = string.splitCSV();
alert(parsed.join("|"));

This jsfiddle kind of works, but it looks like some of the elements have spaces before them.

这是jsfiddle,但是看起来有些元素前面有空格。

#13


0  

Aside from the excellent and complete answer from ridgerunner, I thought of a very simple workaround for when your backend runs php.

除了ridgerunner出色而完整的回答之外,我还想到了一个非常简单的解决方案,用于后端运行php。

Add this php file to your domain's backend (say: csv.php)

将此php文件添加到域的后端(例如:csv.php)

<?php
session_start(); //optional
header("content-type: text/xml");
header("charset=UTF-8");
//set the delimiter and the End of Line character of your csv content:
echo json_encode(array_map('str_getcsv',str_getcsv($_POST["csv"],"\n")));
?>

Now add this function to your javascript toolkit (should be revised a bit to make crossbrowser I believe.)

现在,将这个函数添加到javascript工具包中(我认为应该对它做一点修改,使其成为跨浏览器)。

function csvToArray(csv) {
    var oXhr = new XMLHttpRequest;
    oXhr.addEventListener("readystatechange",
            function () {
                if (this.readyState == 4 && this.status == 200) {
                    console.log(this.responseText);
                    console.log(JSON.parse(this.responseText));
                }
            }
    );
    oXhr.open("POST","path/to/csv.php",true);
    oXhr.setRequestHeader("Content-type","application/x-www-form-urlencoded; charset=utf-8");
    oXhr.send("csv=" + encodeURIComponent(csv));
}

Will cost you 1 ajax call, but at least you won't duplicate code nor include any external library.

将花费1个ajax调用,但至少不会重复代码,也不会包含任何外部库。

Ref: http://php.net/manual/en/function.str-getcsv.php

裁判:http://php.net/manual/en/function.str-getcsv.php