如何使用sed替换文件中每行上的字符串之前的第n个空格

时间:2022-12-06 22:05:11

I am trying to replace the space before the surname on each line of a file with a comma using sed.

我试图用逗号使用sed替换文件每行上的姓氏前的空格。

Example Source:

George W Heong§New York§USA
Elizabeth Black§Sheffield, Yorkshire§England
Lucy Jones§Cardiff§Wales
James G K Shackleton§Dallas, Texas§USA
Carl Seddon§Canberra,Australia

Example Ouput:

George W,Heong§New York§USA
Elizabeth,Black§Sheffield, Yorkshire§England
Lucy,Jones§Cardiff§Wales
James G K,Shackleton§Dallas, Texas§USA
Carl,Seddon§Canberra,Australia

I think I've worked out a method to obtain the index of the relevant space as follows:

我想我已经找到了一种获取相关空间索引的方法,如下所示:

int idx$ = str.indexOf("§");
int nthSpace = str.lastIndexOf(" ", idx$);

but I haven't been able to work out how to replace the nth instance with the variable nthSpace. This is what have got so far:

但我还没有弄清楚如何用变量nthSpace替换第n个实例。这是到目前为止:

sed "s/$nthSpace" "/,/" datain.txt > dataout.txt

Any asistance would be appreciated.

任何援助将不胜感激。

2 个解决方案

#1


1  

With sed :

用sed:

sed 's/ \([^ ]*§\)/,\1/' sourcefile

The pattern looks for the first occurence of :

该模式寻找第一次出现:

  • a space
  • followed by any non-space char
  • 然后是任何非空格字符

  • followed by §
  • 其次是 §

The name is captured in a group that is used in the substitution to be prefixed with a ,

该名称在一个用于替换的组中捕获,该组以a为前缀,

UPDATE :

To prevent strings as name § to be matched, you can preprocess the first substitution with s/ +§/§/. The final command will be :

要防止匹配名称§的字符串,可以使用s / +§/§/预处理第一个替换。最后的命令是:

sed 's/ +§/§/;s/ \([^ ]*§\)/,\1/' sourcefile

As noticed in question comments, multipart surnames (separated with spaces) will be split if not rewritten manually.

正如在问题评论中注意到的那样,如果不手动重写,多部分姓氏(用空格分隔)将被拆分。

#2


1  

With gensub, available in GNU awk, you can do this:

使用GNU awk中提供的gensub,您可以执行以下操作:

awk 'BEGIN{FS=OFS="§"} {$1=gensub(/[[:blank:]]([^[:blank:]]+)$/, ",\\1", 1, $1)} 1' file

Output:

George W,Heong§New York§USA
Elizabeth,Black§Sheffield, Yorkshire§England
Lucy,Jones§Cardiff§Wales
James G K,Shackleton§Dallas, Texas§USA
Carl,Seddon§Canberra,Australia

#1


1  

With sed :

用sed:

sed 's/ \([^ ]*§\)/,\1/' sourcefile

The pattern looks for the first occurence of :

该模式寻找第一次出现:

  • a space
  • followed by any non-space char
  • 然后是任何非空格字符

  • followed by §
  • 其次是 §

The name is captured in a group that is used in the substitution to be prefixed with a ,

该名称在一个用于替换的组中捕获,该组以a为前缀,

UPDATE :

To prevent strings as name § to be matched, you can preprocess the first substitution with s/ +§/§/. The final command will be :

要防止匹配名称§的字符串,可以使用s / +§/§/预处理第一个替换。最后的命令是:

sed 's/ +§/§/;s/ \([^ ]*§\)/,\1/' sourcefile

As noticed in question comments, multipart surnames (separated with spaces) will be split if not rewritten manually.

正如在问题评论中注意到的那样,如果不手动重写,多部分姓氏(用空格分隔)将被拆分。

#2


1  

With gensub, available in GNU awk, you can do this:

使用GNU awk中提供的gensub,您可以执行以下操作:

awk 'BEGIN{FS=OFS="§"} {$1=gensub(/[[:blank:]]([^[:blank:]]+)$/, ",\\1", 1, $1)} 1' file

Output:

George W,Heong§New York§USA
Elizabeth,Black§Sheffield, Yorkshire§England
Lucy,Jones§Cardiff§Wales
James G K,Shackleton§Dallas, Texas§USA
Carl,Seddon§Canberra,Australia