如何删除双引号内的新行？

How can I remove new line inside the " from a file?

如何从“文件”中删除新行?

For example:

"one", 
"three
four",
"seven"

So I want to remove the \n between the three and four. Should I use regular expression, or I have to read that's file per character with program?

所以我想删除三和四之间的\ n。我应该使用正则表达式,还是我必须使用程序读取每个字符的文件?

5 个解决方案

#1

To handle specifically those newlines that are in doubly-quoted strings and leave those alone that are outside them, using GNU awk (for RT):

要使用GNU awk(用于RT)专门处理双引号字符串中的那些换行符并将它们留在它们之外的那些换行符:

gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' file

This works by splitting the file along " characters and removing newlines in every other block. With a file containing

这是通过将文件沿“字符拆分并删除每个其他块中的换行符来实现的。使用包含的文件

"one",
"three
four",
12,
"seven"

this will give the result

这将给出结果

"one",
"threefour",
12,
"seven"

Note that it does not handle escape sequences. If strings in the input data can contain \", such as "He said: \"this is a direct quote.\"", then it will not work as desired.

请注意,它不处理转义序列。如果输入数据中的字符串可以包含\“,例如”他说:\“这是直接引用。\”“,那么它将无法按预期工作。

#2

You can print those lines starting with ". If they don't, accumulate its content into a variable and print it later on:

您可以打印以“。”开头的那些行。如果没有,请将其内容累积到变量中并稍后打印:

$ awk '/^"/ {if (f) print f; f=$0; next} {f=f FS $0} END {print f}' file
"one", 
"three four",
"seven"

Since we are always printing the previous block of text, note the need of END to print the last stored value after processing the full file.

由于我们始终打印上一个文本块,因此请注意在处理完整文件后需要END来打印最后存储的值。

#3

You can use sed for that:

你可以使用sed:

sed -r '/^"[^"]+$/{:a;N;/",/!ba;s/\n/ /g}' text

The command searches for lines which start with a doublequote but don't contain another doublequote: /^"[^"]+$/

该命令搜索以双引号开头但不包含另一个双引号的行:/ ^“[^”] + $ /

If such a line is found a label :a is defined to mark the start of a loop. Using the N command we append another line from input to the current buffer. If the new line again doesn't contain the closing double quote /",/! we step again to label a using ba unless we found the closing quote.

如果找到这样的行,则标签:a被定义为标记循环的开始。使用N命令,我们将另一行从输入追加到当前缓冲区。如果新行再次不包含结束双引号/“,/!我们再次标记使用ba,除非我们找到结束报价。

If the quote was found all newlines gettting replaces by a space s/\n/ /g and the buffer gets automatically printed by sed.

如果发现引用所有换行符取代空格s / \ n / / g并且缓冲区由sed自动打印。

#4

A simplistic solution:

一个简单的解决方案:

#!/usr/bin/perl

use strict;
use warnings;

while (<DATA>) {
    chomp;
    if (m/^\"/) { print "\n"; }
    print;
}


__DATA__
"one", 
"three
four",
"seven"

But taking the specific case of csv style data, I'd suggest using a perl module called Text::CSV which parses CSV properly - and treats the 'element with a linefeed' part of the preceeding row.

但是考虑到csv样式数据的特定情况,我建议使用名为Text :: CSV的perl模块正确解析CSV - 并处理前一行的'linefeed'部分。

#!/usr/bin/perl

use strict;
use warnings;

use Text::CSV;

my $csv = Text::CSV->new( { binary => 1 } );

open( my $input, "<", "input.csv" ) or die $!;

while ( my $row = $csv->getline($input) ) {
    for (@$row) {
        #remove linefeeds in each 'element'. 
        s/\n/ /g;
        #print this specific element ('naked' e.g. without quotes). 
        print;
        print ",";
    }
    print "\n";
}
close($input);

#5

tested in a bash

在bash中测试过

purpose: replace newline inside double quote by \n

目的:用\ n替换双引号中的换行符

works for unix newline (\n), windows newline (\r\n) and mac newline (\n\r)

适用于unix换行符(\ n),windows换行符(\ r \ n)和mac换行符(\ n \ r \ n)

echo -e '"line1\nline2"'`

echo -e'“line1 \ nline2”'`

line1
line2

echo -e '"line1\nline2"' | gawk -v RS='"' 'NR % 2 == 0 { gsub(/\r?\n\r?/, "\n") } { printf("%s%s", $0, RT) }'

echo -e'“line1 \ nline2”'| gawk -v RS =''''NR%2 == 0 {gsub(/ \ r?\ n \ r?/,“\ n”)} {printf(“%s%s”,$ 0,RT)} “

line1\nline2

#1