I have a CSV file that I need to enclose each value in quotes, where each value is a string. I'm getting unexpected quotes when concatenating
我有一个CSV文件,我需要将每个值括在引号中,其中每个值都是一个字符串。连接时我得到了意想不到的引用
$outline = "";
$line = "John,Smith,jsmith@bogusaddress.net,000-0000";
@parts = split (',',$line);
for $part (@parts) {
$part = '"' . $part . '"';
if ($outline eq "") {
$outline = $part; # reconstruct line
} else {
$outline = $outline . "," . $part;
}
}
$outline = $outline . "," . '"' . $parts[0] . " " . $parts[1] . '"';
print "$outline\n";
I expected:
我期望:
"John","Smith","jsmith.net","000-0000","John Smith"
but I got:
但我得到了:
"John","Smith","jsmith.net","000-0000",""John" "Smith""
Why am I getting the extra quotes?
为什么我会得到额外的报价?
Thanks for the help.
谢谢您的帮助。
4 个解决方案
#1
0
$part
in the foreach
loop aliases each element of @parts
. So you're actually storing back into the array, the strings you wrapped with quotes.
$ parts中的$ parts @parts中的$ parts。所以你实际上存储回数组,用引号包装的字符串。
Try using Data::Dumper
and dump @parts
at the bottom of each loop.
尝试使用Data :: Dumper并在每个循环的底部转储@parts。
use Data::Dumper;
...
print Dumper( \@parts );
#2
6
A lot of practical solutions have been provided, I however wanted to address your question: Why does this happen?
已经提供了许多实用的解决方案,但是我想解决你的问题:为什么会这样?
The reason you are getting the double double quotes is that you are actually changing the elements of @parts
. Inside a for
loop, the elements are aliased to the loop arguments, so any changes to them directly are made on the "real" values as well. Consider the following:
你得到双引号的原因是你实际上正在改变@parts的元素。在for循环中,元素是循环参数的别名,因此对它们的任何更改都是在“实际”值上进行的。考虑以下:
my @foos = 1 .. 3;
for my $foo (@foos) {
$foo += 1;
}
print "@foos"; # prints 2 3 4
So when you change $part
in your code, the array @parts
is also changed, and becomes like this (Data::Dumper
output):
因此,当您更改代码中的$ part时,数组@parts也会更改,并且变为这样(Data :: Dumper输出):
$VAR1 = [
'"John"',
'"Smith"',
'"jsmith@bogusaddress.net"',
'"000-0000"'
];
And from that point on, you cannot put together the string "John"
and "Smith"
without first removing the quotes again.
从那时起,你不能在没有先删除引号的情况下将字符串“John”和“Smith”放在一起。
I also prepared a solution using Text::CSV
, and I see ThisSuitIsBlackNot has already done so, so you can take a look at his answer for a practical solution.
我还使用Text :: CSV编写了一个解决方案,我看到ThisSuitIsBlackNot已经这样做了,所以你可以看看他的答案,找到一个实用的解决方案。
For a more lightweight solution you can use Text::ParseWords
. This, like Text::CSV
, has the benefit of handling quoted delimiters.
对于更轻量级的解决方案,您可以使用Text :: ParseWords。这与Text :: CSV一样,具有处理引用分隔符的优点。
use Text::ParseWords;
my $line = 'John,Smith,jsmith@bogusaddress.net,000-0000';
my @parts = quotewords(",", 0, $line);
push @parts, "@parts[0,1]";
print join ",", map qq("$_"), @parts;
#3
2
I always use Text::CSV
when working with delimited data. It allows you to easily change delimiters, quoting behavior, and escape characters, and handles fields that contain the delimiter, which is difficult to handle on your own (although this isn't applicable to your example).
在处理分隔数据时,我总是使用Text :: CSV。它允许您轻松更改分隔符,引用行为和转义字符,并处理包含分隔符的字段,这很难自行处理(尽管这不适用于您的示例)。
The following will quote all of the fields in the file input.csv
and write the results to STDOUT
:
以下将引用input.csv文件中的所有字段并将结果写入STDOUT:
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new({
binary => 1,
auto_diag => 1,
always_quote => 1,
eol => $/
}) or die "Cannot use CSV: " . Text::CSV->error_diag;
open my $fh, '<', 'input.csv' or die "input.csv: $!";
while (my $row = $csv->getline($fh)) {
$csv->print(\*STDOUT, $row);
}
close $fh;
input.csv
input.csv
John,Smith,jsmith@bogusaddress.net,000-0000
Jane,Doe,jdoe@bogusaddress.net,000-0000
Output
产量
"John","Smith","jsmith@bogusaddress.net","000-0000"
"Jane","Doe","jdoe@bogusaddress.net","000-0000"
#4
0
There's no reason to use a for
loop to string together the various parts. If you can use split
, you can use join
:
没有理由使用for循环将各个部分串在一起。如果可以使用split,则可以使用join:
my $line = "John,Smith,jsmith@bogusaddress.net,000-0000";
my @parts = split /,/, $line; # Split the line on commas
my $new_line = join q(","), @parts; # Separate out the parts with quote-comma-quote
my $new_line = qq("$new_line"); # Add pre and post quotes
The q(...)
is a quote-like operator that acts as a single quote. The qq(...)
is a quote-like operator that acts as double quotes. It's a bit easier to understand qq("$line")
and q(",")
instead of "\"$line"\"
or '","'
.
q(...)是一个类似报价的运算符,用作单引号。 qq(...)是一个类似引号的运算符,用作双引号。理解qq(“$ line”)和q(“,”)而不是“\”$ line“\”或“”,“'更容易一些。
I'm using join to join all the parts with ","
. That handles the separation in the middle of $new_line
, but doesn't handle the beginning and ending quote. Thus, I need a second command line to add the pre and post quotes.
我正在使用join以“,”加入所有部分。它处理$ new_line中间的分隔,但不处理开始和结束引用。因此,我需要第二个命令行来添加前缀和后置引号。
#1
0
$part
in the foreach
loop aliases each element of @parts
. So you're actually storing back into the array, the strings you wrapped with quotes.
$ parts中的$ parts @parts中的$ parts。所以你实际上存储回数组,用引号包装的字符串。
Try using Data::Dumper
and dump @parts
at the bottom of each loop.
尝试使用Data :: Dumper并在每个循环的底部转储@parts。
use Data::Dumper;
...
print Dumper( \@parts );
#2
6
A lot of practical solutions have been provided, I however wanted to address your question: Why does this happen?
已经提供了许多实用的解决方案,但是我想解决你的问题:为什么会这样?
The reason you are getting the double double quotes is that you are actually changing the elements of @parts
. Inside a for
loop, the elements are aliased to the loop arguments, so any changes to them directly are made on the "real" values as well. Consider the following:
你得到双引号的原因是你实际上正在改变@parts的元素。在for循环中,元素是循环参数的别名,因此对它们的任何更改都是在“实际”值上进行的。考虑以下:
my @foos = 1 .. 3;
for my $foo (@foos) {
$foo += 1;
}
print "@foos"; # prints 2 3 4
So when you change $part
in your code, the array @parts
is also changed, and becomes like this (Data::Dumper
output):
因此,当您更改代码中的$ part时,数组@parts也会更改,并且变为这样(Data :: Dumper输出):
$VAR1 = [
'"John"',
'"Smith"',
'"jsmith@bogusaddress.net"',
'"000-0000"'
];
And from that point on, you cannot put together the string "John"
and "Smith"
without first removing the quotes again.
从那时起,你不能在没有先删除引号的情况下将字符串“John”和“Smith”放在一起。
I also prepared a solution using Text::CSV
, and I see ThisSuitIsBlackNot has already done so, so you can take a look at his answer for a practical solution.
我还使用Text :: CSV编写了一个解决方案,我看到ThisSuitIsBlackNot已经这样做了,所以你可以看看他的答案,找到一个实用的解决方案。
For a more lightweight solution you can use Text::ParseWords
. This, like Text::CSV
, has the benefit of handling quoted delimiters.
对于更轻量级的解决方案,您可以使用Text :: ParseWords。这与Text :: CSV一样,具有处理引用分隔符的优点。
use Text::ParseWords;
my $line = 'John,Smith,jsmith@bogusaddress.net,000-0000';
my @parts = quotewords(",", 0, $line);
push @parts, "@parts[0,1]";
print join ",", map qq("$_"), @parts;
#3
2
I always use Text::CSV
when working with delimited data. It allows you to easily change delimiters, quoting behavior, and escape characters, and handles fields that contain the delimiter, which is difficult to handle on your own (although this isn't applicable to your example).
在处理分隔数据时,我总是使用Text :: CSV。它允许您轻松更改分隔符,引用行为和转义字符,并处理包含分隔符的字段,这很难自行处理(尽管这不适用于您的示例)。
The following will quote all of the fields in the file input.csv
and write the results to STDOUT
:
以下将引用input.csv文件中的所有字段并将结果写入STDOUT:
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new({
binary => 1,
auto_diag => 1,
always_quote => 1,
eol => $/
}) or die "Cannot use CSV: " . Text::CSV->error_diag;
open my $fh, '<', 'input.csv' or die "input.csv: $!";
while (my $row = $csv->getline($fh)) {
$csv->print(\*STDOUT, $row);
}
close $fh;
input.csv
input.csv
John,Smith,jsmith@bogusaddress.net,000-0000
Jane,Doe,jdoe@bogusaddress.net,000-0000
Output
产量
"John","Smith","jsmith@bogusaddress.net","000-0000"
"Jane","Doe","jdoe@bogusaddress.net","000-0000"
#4
0
There's no reason to use a for
loop to string together the various parts. If you can use split
, you can use join
:
没有理由使用for循环将各个部分串在一起。如果可以使用split,则可以使用join:
my $line = "John,Smith,jsmith@bogusaddress.net,000-0000";
my @parts = split /,/, $line; # Split the line on commas
my $new_line = join q(","), @parts; # Separate out the parts with quote-comma-quote
my $new_line = qq("$new_line"); # Add pre and post quotes
The q(...)
is a quote-like operator that acts as a single quote. The qq(...)
is a quote-like operator that acts as double quotes. It's a bit easier to understand qq("$line")
and q(",")
instead of "\"$line"\"
or '","'
.
q(...)是一个类似报价的运算符,用作单引号。 qq(...)是一个类似引号的运算符,用作双引号。理解qq(“$ line”)和q(“,”)而不是“\”$ line“\”或“”,“'更容易一些。
I'm using join to join all the parts with ","
. That handles the separation in the middle of $new_line
, but doesn't handle the beginning and ending quote. Thus, I need a second command line to add the pre and post quotes.
我正在使用join以“,”加入所有部分。它处理$ new_line中间的分隔,但不处理开始和结束引用。因此,我需要第二个命令行来添加前缀和后置引号。