Help me guys, I'm really lost here. I have a big text file, full of links, and I'm trying to separate them based on which website the link belongs. I was trying to do it with the csplit command, but I'm not really sure how I would do it, as it would depend on the text content.
帮助我们,我真的迷失在这里。我有一个大文本文件,充满链接,我试图根据链接所属的网站将它们分开。我试图用csplit命令来做,但我不确定我会怎么做,因为它取决于文本内容。
Text example:
文字示例:
www.unix.com/man-page/opensolaris/1/csplit/&hl=en
www.unix.com/shell-programming-and-scripting/126539-csplit-help.html/RK=0/RS=iGOr1SINnK126qZciYPZtBHpEmg-
www.w3cschool.cc/linux/linux-comm-csplit.html
www.linuxdevcenter.com/cmd/cmd.csp?path=c/csplit+"csplit"&hl=en&ct=clnk
So in this example the first two links would be in one file, and the 2 left would be in one file each. How would this work? I really don't have any idea if this is even possible. (novice programmer)
因此,在此示例中,前两个链接将位于一个文件中,而左侧的两个链接将分别位于一个文件中。这怎么样?我真的不知道这是否可能。 (新手程序员)
1 个解决方案
#1
2
try :
尝试:
awk 'BEGIN{FS="/"} {print > $1}' [your file name]
output:
输出:
cat www.unix.com
www.unix.com/man-page/opensolaris/1/csplit/&hl=en
www.unix.com/shell-programming-and-scripting/126539-csplit-help.html/RK=0/RS=iGOr1SINnK126qZciYPZtBHpEmg-
cat www.linuxdevcenter.com
www.linuxdevcenter.com/cmd/cmd.csp?path=c/csplit+"csplit"&hl=en&ct=clnk
cat www.w3cschool.cc
www.w3cschool.cc/linux/linux-comm-csplit.html
{print > $1}
will redirect output to separate files based on $1
, in this case, the domain name.
{print> $ 1}会将输出重定向到基于$ 1的单独文件,在本例中为域名。
#1
2
try :
尝试:
awk 'BEGIN{FS="/"} {print > $1}' [your file name]
output:
输出:
cat www.unix.com
www.unix.com/man-page/opensolaris/1/csplit/&hl=en
www.unix.com/shell-programming-and-scripting/126539-csplit-help.html/RK=0/RS=iGOr1SINnK126qZciYPZtBHpEmg-
cat www.linuxdevcenter.com
www.linuxdevcenter.com/cmd/cmd.csp?path=c/csplit+"csplit"&hl=en&ct=clnk
cat www.w3cschool.cc
www.w3cschool.cc/linux/linux-comm-csplit.html
{print > $1}
will redirect output to separate files based on $1
, in this case, the domain name.
{print> $ 1}会将输出重定向到基于$ 1的单独文件,在本例中为域名。