I have a 2GB text file on my linux box that I'm trying to import into my database.
我的linux盒子上有一个2GB的文本文件,我正试图导入我的数据库。
The problem I'm having is that the script that is processing this rdf file is choking on one line:
我遇到的问题是处理此rdf文件的脚本在一行上窒息:
mismatched tag at line 25462599, column 2, byte 1455502679:
<link r:resource="http://www.epuron.de/"/>
<link r:resource="http://www.oekoworld.com/"/>
</Topic>
=^
I want to replace the </Topic>
with </Line>
. I can't do a search/replace on all lines but I do have the line number so I'm hoping theres some easy way to just replace that one line with the new text.
我想用 替换 。我不能在所有行上搜索/替换,但我确实有行号,所以我希望有一些简单的方法来用新文本替换那一行。
Any ideas/suggestions?
5 个解决方案
#1
11
sed -i yourfile.xml -e '25462599s!</Topic>!</Line>!'
#2
7
sed -i '25462599 s|</Topic>|</Line>|' nameoffile.txt
#3
6
The tool for editing text files in Unix, is called ed
(as opposed to sed
, which as the name implies is a stream editor).
用于在Unix中编辑文本文件的工具称为ed(与sed相反,顾名思义是sed,它是一个流编辑器)。
ed
was once intended as an interactive editor, but it can also easily scripted. The way ed
works, is that all commands take an address parameter. The way to address a specific line is just the line number, and the way to change the addressed line(s) is the s
command, which takes the same regexp that sed
would. So, to change the 42nd line, you would write something like 42s/old/new/
.
ed曾经被用作交互式编辑器,但它也可以轻松编写脚本。 ed的工作方式是所有命令都采用地址参数。寻址特定行的方法只是行号,更改寻址行的方法是s命令,它采用与sed相同的正则表达式。所以,要改变第42行,你会写出42s / old / new /。
Here's the entire command:
这是整个命令:
FILENAME=/path/to/whereever
LINENUMBER=25462599
ed -- "${FILENAME}" <<-HERE
${LINENUMBER}s!</Topic>!</Line>!
w
q
HERE
The advantage of this is that ed
is standardized, while the -i
flag to sed
is a proprietary GNU extension that is not available on a lot of systems.
这样做的好处是ed是标准化的,而sed的-i标志是一个专有的GNU扩展,很多系统都没有。
#4
2
Use "head" to get the first 25462598 lines and use "tail" to get the remaining lines (starting at 25462601). Though... for a 2GB file this will likely take a while.
使用“head”获取前25462598行并使用“tail”获取剩余行(从25462601开始)。虽然......对于2GB文件,这可能需要一段时间。
Also are you sure the problem is just with that line and not somewhere previous (ie. the error looks like an XML parse error which might mean the actual problem is someplace else).
你也确定问题只是在那条线上,而不是之前的某个地方(即错误看起来像是一个XML解析错误,这可能意味着实际问题在其他地方)。
#5
1
My shell script:
我的shell脚本:
#!/bin/bash
awk -v line=$1 -v new_content="$2" '{
if (NR == line) {
print new_content;
} else {
print $0;
}
}' $3
Arguments:
first: line number you want change
second: text you want instead original line contents
third: file name
This script prints output to stdout then you need to redirect. Example:
此脚本将输出打印到stdout,然后您需要重定向。例:
./script.sh 5 "New fifth line text!" file.txt
You can improve it, for example, by taking care that all your arguments has expected values.
例如,您可以通过注意所有参数都具有预期值来改进它。
#1
11
sed -i yourfile.xml -e '25462599s!</Topic>!</Line>!'
#2
7
sed -i '25462599 s|</Topic>|</Line>|' nameoffile.txt
#3
6
The tool for editing text files in Unix, is called ed
(as opposed to sed
, which as the name implies is a stream editor).
用于在Unix中编辑文本文件的工具称为ed(与sed相反,顾名思义是sed,它是一个流编辑器)。
ed
was once intended as an interactive editor, but it can also easily scripted. The way ed
works, is that all commands take an address parameter. The way to address a specific line is just the line number, and the way to change the addressed line(s) is the s
command, which takes the same regexp that sed
would. So, to change the 42nd line, you would write something like 42s/old/new/
.
ed曾经被用作交互式编辑器,但它也可以轻松编写脚本。 ed的工作方式是所有命令都采用地址参数。寻址特定行的方法只是行号,更改寻址行的方法是s命令,它采用与sed相同的正则表达式。所以,要改变第42行,你会写出42s / old / new /。
Here's the entire command:
这是整个命令:
FILENAME=/path/to/whereever
LINENUMBER=25462599
ed -- "${FILENAME}" <<-HERE
${LINENUMBER}s!</Topic>!</Line>!
w
q
HERE
The advantage of this is that ed
is standardized, while the -i
flag to sed
is a proprietary GNU extension that is not available on a lot of systems.
这样做的好处是ed是标准化的,而sed的-i标志是一个专有的GNU扩展,很多系统都没有。
#4
2
Use "head" to get the first 25462598 lines and use "tail" to get the remaining lines (starting at 25462601). Though... for a 2GB file this will likely take a while.
使用“head”获取前25462598行并使用“tail”获取剩余行(从25462601开始)。虽然......对于2GB文件,这可能需要一段时间。
Also are you sure the problem is just with that line and not somewhere previous (ie. the error looks like an XML parse error which might mean the actual problem is someplace else).
你也确定问题只是在那条线上,而不是之前的某个地方(即错误看起来像是一个XML解析错误,这可能意味着实际问题在其他地方)。
#5
1
My shell script:
我的shell脚本:
#!/bin/bash
awk -v line=$1 -v new_content="$2" '{
if (NR == line) {
print new_content;
} else {
print $0;
}
}' $3
Arguments:
first: line number you want change
second: text you want instead original line contents
third: file name
This script prints output to stdout then you need to redirect. Example:
此脚本将输出打印到stdout,然后您需要重定向。例:
./script.sh 5 "New fifth line text!" file.txt
You can improve it, for example, by taking care that all your arguments has expected values.
例如,您可以通过注意所有参数都具有预期值来改进它。